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Abstract 

We resolve several fundamental questions in the area of distributed functional monitoring, initiated 
by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there 
are k sites each tracking their input and communicating with a central coordinator The coordinator's 
task is to continuously maintain an approximate output to a function / computed over the union of the 
inputs. The goal is to minimize the number of bits communicated. 

We show the randomized communication complexity of estimating the number of distinct elements 
(the zero-th frequency moment Fq) up to a 1 + e factor is r2(/c/e^), improving upon the previous 
+ 1/e^) bound and matching known upper bounds. For the p-th frequency moment Fp, p > 
1, we improve the previous il{k + 1/e^) communication bound to f2(fc^~^/e^). We obtain similar 
improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the 
first of any kind in distributed functional monitoring to depend on the product of k and More- 
over, the lower bounds are for the static fc-party number-in-hand communication model; surprisingly 
they almost match what is achievable in the dynamic model. We also show that we can estimate Fp, 
for any p > 1, using 0(fc''~^poly(£~^)) communication. This drastically improves upon the previ- 
ous 0{k'^P^^N^^^/Ppo\y{e^^)) bound of Cormode, Muthukrishnan, and Yi for general p, and their 
0{k'^/e + k^-^ /e^) bound forp = 2. Forp = 2, our bound resolves their main open question. 

Our lower bounds are based on new direct sum theorems for approximate majority, and yield signifi- 
cant improvements to classical problems in the standard data stream model. First, we improve the known 
lower bound for estimating Fp,p > 2, in t passes from fl{n^~^/P / (e^^Pf)) to i}{n^^^^P / (e^^Pf)), giving 
the first bound that matches what we expect when p = 2. Second, we give the first lower bound for esti- 
mating Fq in t passes with 0(l/(e^t)) bits of space that does not use the hardness of the gap-hamming 
problem. Third, we propose a useful distribution for the gap-hamming problem with high information 
cost or super-polynomial communication, partly answering Question 25 in the Open Problems in Data 
Streams list from the Bertinoro and IITK workshops, and we demonstrate several applications of this. 



1 Introduction 



Recent applications in sensor networks and distributed systems have motivated the distributed functional 
monitoring model, initiated by Cormode, Muthukrishnan, and Yi [20]. In this model there ai^e k sites and 
a single central coordinator. Each site receives a stream of data and the coordinator wants to continuously 
keep track of global properties of the union of the k data streams. The goal is to minimize the amount of 
communication between the sites and the coordinator so that the coordinator can approximately maintain a 
global property of the streams. This is motivated by power constraints, since communication typically uses 
a power-hungry radio [25]. There is a large body of work on monitoring problems in this model, including 
maintaining a random sample [21,48], estimating frequency moments [18,20], finding the heavy hitters 
[6, 40, 43, 51], approximating the quantiles [19, 32, 51], and estimating the entropy [5]. 

We think of the k sites having A^-dimensional vectors v^,. . . jv'', respectively. This is the so-called 
number-in-hand communication model. There are two refinements to this model we will consider: the 
blackboard model, in which each message a site sends is received by all other sites, i.e., it is broadcast, 
and the message-passing model, in which each message is between the coordinator and a specific site^. An 
update to a coordinate j on site i causes Vj to increase by 1. The goal is to estimate a statistic ofv = Yli=i 
such as the p-th frequency moment Fp = \\v\\p, the number of distinct elements Fq = |support(w)|, the 
entropy H = ^ - log etc. This is the standard insertion-only model. For many of these problems, 
with the exception of the empirical entropy, there are strong lower bounds (e.g., ^{N)) if allowing updates 
to coordinates that cause Vj to decrease [5]. The latter is called the update model. Thus, except for entropy, 
we follow previous work and consider the insertion-only model. 

Despite the large body of work in this model, the complexity of these problems is not well understood. 
For example, for estimating Fq up to a (1 + e)-factor, the best upper bound is 0{k/e'^) [20]^, while the only 
known lower bound is ^{k + 1/e^). The dependence on e in the lower bound is not very insightful, as the 

bound follows just by considering two sites [5, 16]. The real question is whether the k and 
factors should multiply. Even more embarrassingly, for the frequency moments Fp, p > 2, the known algo- 
rithms use 0(A;^*'+^iV^~^/^^poly(l/e)) communication, while the only known lower bound is Q{k + 
[5, 16]. Even for p = 2, the best known upper bound is 0(A;^/e + k^'^ /e"^) [20], and the authors' main open 
question in their paper is "It remains to close the gap in the F2 case: can a better lower bound than Q,{k) be 
shown, or do there exist 0{k ■ poly(l/e)) solutions?" 

Our Results: We significantly improve the previous communication bounds for approximating the fre- 
quency moments, entropy, heavy hitters, and quantiles in the distributed functional monitoring model. In 
many cases our bounds are optimal. Our results are summarized in Table 1, where they are compared with 
previous bounds. We have three main results, each introducing a new technique: 

1. We show that for estimating Fq in the message-passing model, Q,{k/e'^) communication is required, 
matching an upper bound of [20] up to a polylogarithmic factor. Our lower bound holds in the static 
model in which the k sites just need to approximate Fq once on their inputs. 

2. We show that we can estimate Fp, for any p > 1, in 0{kP~^poly{e~^)) communication in the 
message-passing model^. This drastically improves upon the previous 0(/c^*'+^A^^~^/Ppoly(e~^)) 
bound of [20]. In particular, setting p = 2, we resolve the main open question of [20]. 

'This is equivalent to allowing any two sites to communicate with each other at a time, up to a factor of log k. 
^We use 0(/) to denote a function of the form / ■ log'^'^'(iVfc/e). 
^We assume the total number of updates is poly(A'^). 
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Previous work This paper 


Previous work This paper 


Problem 


LB LB (all static) 


UB UB 


Fo 
F2 

Fp ip > 1) 
AU-quantile 
Heavy Hitters 
Entropy 

£p{pe [0,2]) 


n{k) [20] n{k/e^) 
n{k) [20] f2(k/£2) (BB) 

n{k + l/e^)[5,l6] f2(kP-V£^) (BB) 
f2(min{^, ^}) [32] Jl(min{^, i}) (BB) 
Q(min{^, ^}) [32] n{mm{^, ^}) (BB) 
nil/V^) [5] fl{k/e^) (BB) 
f2(k/£2) (BB) 


0{k/e^) [20] 
d{kys + k^ ys^) [20] O(^) 

0(^fc2^'+i7V^-^/^) [20] O(^) 

0(min{^,i})[32] 

0(min{^,i})[32] 
O(^) [5], (static) [31] 

d{k/e^) (static) [38] 



Table 1: UB denotes upper bound; LB denotes lower bound; BB denotes blackboard model. N denotes 
the universe size. All bounds are for randomized algorithms. We assume all bounds hold in the dynamic 
setting by default, and will state explicitly if they hold in the static setting. For lower bounds we assume the 
message-passing model by default, and state explicitly if they also hold in the blackboard model. 



3. We show 17(fcP^^/e^) communication is necessary for approximating Fp {p > 1) in the blackboard 
model, significantly improving the prior il{k + bound. As with our lower bound for Fq, these 
are the first lower bounds which depend on the product of k and l/e. As with Fq, our lower bound 
holds in the static model in which the sites just approximate Fp once. 

Our other results in Table 1 are explained in the body of the paper, and use similar techniques. 

Our Techniques: Lower Bound for Fq: Our rj(A;/e^) bound for Fq is based on the following primitive 
problem A;-GAP-MAJ. For illustration, suppose k = There are sites each holding a random 

independent bit. Their task is to decide if at least 1 / (2e^) + l/e of the bits are 1, or at most 1/ (2e2 ) - l/e of 
the bits are 1. We show any coiTcct protocol must reveal r2(l/e^) bits of information about the sites' inputs. 
We "compose" this with 2-party disjointness (2-DISJ) [46], in which each party has a bitstring of length 
1/e^ and either the strings have disjoint support (the solution is 0) or there is a single coordinate which is 1 
in both strings (the solution is 1). Let r be the hard distribution, shown to require Q{l/e^) communication 
to solve [46]. Suppose the coordinator and each site share an instance of 2-DISJ in which the solution to 
2-DISJ is a random bit, which is the site's effective input to fc-GAP-MAJ. The coordinator has the same 
input for each of the 1/e^ instances, while the sites have an independent input drawn from r conditioned 
on the coordinator's input and output bit determined by /c-GAP-MAJ. The inputs are chosen so that if the 
output of 2-DISJ is 1, then Fq increases by 1, otherwise it remains the same. This is not entirely accurate, 
but it illustrates the main idea. Now, the key is that by the rectangle property of A: -party communication 
protocols, the 1/e^ different output bits are independent conditioned on the transcript. Thus if a protocol 
does not reveal r2(l/e^) bits of information about these output bits, by an anti-concentration theorem we can 
show that the protocol cannot succeed with large probability. Finally, since a (1 + e)-approximation to Fq 
can decide fc-GAP-MAJ, and since any correct protocol for A:-GAP-MAJ must reveal r2(l/e^) information, 
the protocol must solve r2(l/e^) instances of 2-DISJ, each requiring Q{l/e^) communication (otherwise 
the coordinator could simulate — 1 of the sites and obtain an o(l/e^)- communication protocol for 2-DISJ 
with the remaining site, contradicting the communication lower bound for 2-DISJ on this distribution). We 
obtain an Q.{k/e^) bound by using similar ai^guments. One cannot show this in the blackboard model since 
there is an 0{k + 1/e^) bound for Fq 

■*The idea is to first obtain a 2-approximation. Then, sub-sample so that there are S(l/e^) distinct elements. Then the first party 
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Lower Bound for Fp.- Our bound for Fp cannot use the above reduction since we do not 

know how to turn a protocol for approximating Fp into a protocol for solving the composition of A;-GAP- 
MAJ and 2-DISJ. Instead, our starting point is a recent il(l/e^) lower bound for the 2-party gap-hamming 
distance GHD [16]. Each party has a 1/e^-length bitstring, x and y, respectively, and they must decide if 
the Hamming distance A(x, y) > l/(2e^) + 1/e or A(x,y) < l/(2e^) — A simplification by Sherstov 
[47] shows a related problem called 2-GAP-ORT also has communication. Here there are two 

parties, each with 1/e^-length bitstrings x and y, and they must decide if \A{x,y) — l/(2e^)| > 2/e or 
\A{x,y) — l/(2e^)| < l/e. We observe that Sherstov proves that 2-GAP-ORT is hard when x and y are 
drawn from a product uniform distribution. Therefore, by a simulation result of Barak ef. al. [9], this implies 
that any correct protocol for 2-GAP-ORT must reveal 17 ( 1 /e^ ) ^ information about {x,y). By independence 
and the chain rule, this means for i7(l/e^) indices i, information is revealed about {xi, yi) conditioned 
on values {xj,yj) for j < i. We now "embed" an independent copy of a variant of A;-party-disjointness, 
the A;-XOR problem, on each of the l/e^ coordinates of 2-GAP-ORT. In this variant, there are k parties 
each holding a bitstring of length k^. On all but one "special" randomly chosen coordinate, there is a single 
site assigned to the coordinate and that site uses private randomness to choose whether the value on the 
coordinate is or 1 (with equal probability), and the remaining k — 1 sites have on this coordinate. On 
the special coordinate, with probability 1/4 all sites have a on this coordinate (a "00" instance), with 
probability 1/4 the first k/2 parties have a 1 on this coordinate and the remaining k/2 parties have a (a 
"10" instance), with probability 1/4 the second k/2 parties have a 1 on this coordinate and the remaining 
k/2 parties have a (a "01" instance), and with the remaining probability 1/4 all /c parties have a 1 on this 
coordinate (a "11" instance). We show, via a direct sum for distributional communication complexity, that 
any deterministic protocol that decides which case the special coordinate is in with probability 1/4 + 17(1) 
has conditional information cost Q.{k'P^^), where the information is measured with respect to the input 
distribution. This implies that any protocol that can decide whether the output is in the set {10,01} (the 
"XOR" of the output bits) with probability 1/2 + 0(1) has conditional information cost ^{kP^^). We do 
the direct sum argument by conditioning the mutual information on low-entropy random variables which 
allow us to fill in inputs on remaining coordinates without any communication and without asymptotically 
affecting our 17(A;P~^) lower bound. We design a reduction so that on the i-th coordinate of 2-GAP-ORT, 
the input of the first k / 2-players of A;-XOR is determined by the public coin (which we condition on) and the 
first party's input bit to 2-GAP-ORT, and the input of the second fc/2-players of fc-XOR is determined by 
the public coin and the second party's input bit to 2-GAP-ORT . We show that any protocol that solves the 
composition of 2-GAP-ORT with 1/e^ copies of fc-XOR , a problem that we call fc-BTX , must reveal 17(1) 
bits of information about the two output bits of an 17(1) fraction of the 1/e^ copies, and from our 17 (A;^"^) 
information cost lower bound for a single copy, we can obtain an overall 17(fc^'~^/e^) bound. Finally, one 
can show that a (1 + e)-approximation algorithm for Fp can be used to solve /c-BTX . 

Upper Bound for Fp: We illustrate the algorithm for p = 2 and constant e. Unlike [20], we do not use 
AMS sketches [4]. A nice property of our protocol is that it is the first 1-way protocol (the protocol of [20] 
is not), in the sense that only the sites send messages to the coordinator (the coordinator does not send any 
messages). Moreover, all messages are simple: if a site receives an update to the j-th coordinate, provided 
the value of coordinate j in its stream exceeds a threshold, it decides with a certain probability to send j to 
the coordinator. To determine the threshold and probability, the sites use the public coin to randomly group 

broadcasts his distinct elements, the second party broadcasts the distinct elements he has that the first party does not, etc. 

^We assume that the communication cost of all protocols in the paper is at most poly(7V), where A'^ is the number of coordinates 
in the vector inputs to the parties, since otherwise the lower bound can be proved directly (will be discussed in more detail in Section 
4.1). In this case, applying Theorem 1.3 of [9], we have that the external information cost of the protocol is at least n(l/e^). 
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all coordinates j into buckets Si, where Se contains a 1/2^ fraction of the input coordinates. If j £ 5^, the 
threshold and probability are only a function of i. Inspired by work on sub-sampling [33], we try to estimate 
the number of coordinates j of magnitude in [2^, 2'*+^), for each h. Call this class of coordinates Ch- If 
the contribution to F2 from Ch is significant, then |C/i| « 2"^^^ • F2, and to estimate \Ch\ we only consider 
those j G Ch. that are in for \Ch\ ■ 2^^ « 2~^^ • F2 • 2~^ 1 (we don't know F2 but we can make 
a logarithmic number of guesses). When choosing the threshold and probability we have two competing 
constraints; on the one hand these values must be large enough so that we can accurately estimate X]i=i 
from the samples. On the other hand, the values need to be small enough so that the communication incuiTcd 
from sampling other coordinates in 5^ is small. This latter constraint forces us to use a threshold instead of 
just the same probability for all coordinates in 5^. By choosing the values to be an appropriate function of 
£ we can satisfy both constraints. The analysis is complicated by the fact that different classes contribute at 
different times, and that the coordinator must be correct at all times. 

Implications for the Data Stream Model: In 2003, Indyk and Woodruff introduced the GHD problem 
[34], where a 1-round lower bound shortly followed [50]. Ever since, it seemed the space complexity of 
estimating Fq in a data stream with t > 1 passes hinged on whether GHD required Q{l/e^) communica- 
tion for t rounds, see, e.g.. Question 10 in [2]. A fluny [10, 11, 16,47, 49] of recent work finally resolved 
the complexity of GHD. What our lower bound shows for Fq is that this is not the only way to prove the 
space bound for multiple passes for Fq. Indeed, we just needed to look at parties instead of 
2 parties. Since we have an Q{l/e'^) communication lower bound for Fq with parties, this implies an 
r2((l/e'^)/(t/e^)) = Q{l/{te^)) bound for t-pass algorithms for approximating Fq. Arguably our proof is 
simpler than the recent GHD lower bounds. 

Our fi(fcP^^/e^) bound for Fp also improves a long line of work on the space complexity of estimating 
Fp for p > 2 in a data stream. The current best upper bound is 0(A^^^^/^e^^) bits of space [28]. See 
Figure 1 of [28] for a list of papers which make progress on the e and logarithmic factors. The previous best 
lower bound is 0(Afi-2/Pe-2/p) [8] in FOGS, 2002, and no progress has been made since then. By setting 
kP = e^N , we obtain total communication Q.{e^~'^^'PN^^^/'P /e^), and so the implied space lower bound for 
t-pass algorithms for Fp in a data stream is h{e-'^/PN''~'^/P /{tk)) = ^{N^~'^/p / {e'^/Pt)). This gives the 
first bound that agrees with the tight Q{l/e'^) bound when p = 2 and t is constant. 

As mentioned, we observe that 2-GAP-ORT has information cost Q.{l/e'^) under the product uniform 
distribution or the protocol must have super-polynomial (in N) communication. Since 2-GAP-ORT can be 
written as the AND of two GHD instances on ©(l/e^) bits (see the Corollary after the Main Theorem in 
[47]), this implies a useful distribution for which either the communication cost of GHD is super-polynomial 
or the external information cost is at least ^(l/e^), partly answering Question 25 in the Open Problems in 
Data Streams list from the Bertinoro and IITK workshops [3]. Using standard direct sum theorems, this 
implies solving r independent instances of Fq or F2, say, in a data stream requii'es ^(r/e^) bits of space, 
which was unknown. 

Other Related Work: There are quite a few papers on multiparty number-in-hand communication com- 
plexity, though they are not directly relevant for the problems studied in this paper. Alon et. al. [4] and 
Bar-Yossef et. al. [8] studied lower bounds for multiparty set-disjointness, which has applications to p- 
th frequency moment estimation for p > 2 in the streaming model. Their results were further improved 
in [15,29,36]. Chakrabarti et. al. [13] studied random-partition communication lower bounds for multi- 
party set-disjointness and pointer jumping, which have a number of applications in the random-order data 
stream model. Other work includes Chakrabaiti et. al. [14] for median selection, Magniez et. al. [42] and 
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Chaki^abaiti et. al. [12] for streaming language recognition. Very few studies have been conducted in the 
message-passing model. Duris and Rolim [23] proved several lower bounds in the message-passing model, 
but only for some simple boolean functions. Three related but more restrictive private-message models were 
studied by Gal and Gopalan [27], Ergiin and Jowhari [24], and Guha and Huang [30]. The first two only 
investigated deterministic protocols and the third was tailored for the random-order data stream model. 

Recently Phillips et. al. [45] introduced a technique called symmetrization for the number-in-hand 
communication model. The idea is to try to find a symmetric hard distribution for the k players. Then one 
reduces the A;-player problem to a 2-player problem by assigning Alice the input of a random player and 
Bob the inputs of the remaining k — 1 players. The answer to the fc-player problem gives the answer to the 
2-player problem. By symmetrization one can argue that if the communication lower bound for the resulting 
2-player problem is L, then the lower bound for the A; -player problem is Q{kL). While symmetrization can 
be used to solve some problems for which other techniques are not known, such as bitwise AND and OR, it 
has several serious limitations. First, symmetrization requires a symmetric hard distribution, and for many 
problems this is not known or unlikely to exist; this is true of all of the problems considered in this paper. 
Second, for many problems, when Bob knows the inputs of A; — 1 players, he can determine the answer 
without any communication, and so no embedding into a /c-player protocol is possible. Also, it does not 
give information cost bounds, and so it is difficult to use when composing problems as is done in this paper. 

Paper Outline: In Section 3 and Section 4 we prove our lower bounds for Fq and Fp, p > 1. The lower 
bounds apply to functional monitoring, but hold even in the static model. In Section 5 we show improved 
upper bounds for Fp,p > 1, for functional monitoring. Finally, in Section 6 we prove lower bounds for 
all-quantile, heavy hitters, entropy and £p for any p > 1 in the blackboard model. 

2 Preliminaries 

In this section we review some basics on communication complexity and information theory. 

Information Theory We refer the reader to [22] for a comprehensive introduction to information theory. 
Here we review a few concepts and notation. 

Let H{X) denote the Shannon entropy of the random variable X, and let Hh{p) denote the binary 
entropy function when p G [0, 1]. Let H{X \ Y) denote conditional entropy of X given Y. Let I{X; Y) 
denote the mutual information between two random variables X, Y. Let I{X; Y \ Z) denote the mutual 
information between two random variables X, Y conditioned on Z. The following is a summarization of 
the basic properties of entropy and mutual information that we need. 

Proposition 1 Let X, Y, Z be random variables. 

1. If X takes value m {1, 2, . . . , m}, then H{X) G [0, log m]. 

2. H{X) > H{X I Y) andI{X;Y) = H{X) - H{X \Y)>0. 

3. If X and Z are independent, then we have I{X; Y \ Z) > I{X; Y). 

4. (Chain rule of mutual information) I{X, Y; Z) = I{X; Z) + I{Y; Z \ X). And in general, for any 
random variables Xi, X2, . . . , X„, Y, I{Xi, . . . , X„; Y) = Ya=i H^i}^ I • • • > 

5. (Data processing inequality) If X and Z are conditional independent given Y, then I{X; Y \ Z) < 
I{X-Y). 
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6. (Fano's inequality) Let X be a random variable chosen from domain X according to distribution 
Hx, and Y be a random variable chosen from domain y according to distribution /xy. For any 
reconstruction function g : y ^ X with error 6g, 

H,{6g) + 6g log{\X\ - 1) > H{X \Y). 

7. (The Maximum Likelihood Estimator principle) With the notation as in Fano's inequality, if the recon- 
struction function is g{y) = X for the x that maximizes the conditional probability fixix \ Y = y), 
then ^ 

^9^'^- 2H{X I Y) ■ 

Communication complexity In the two-party randomized communication complexity model (see e.g., 
[41]), we have two players Alice and Bob. Alice is given x ^ X and Bob is given y G 3^, and they 
want to jointly compute a function /(x, y) by exchanging messages according to a protocol 11. Let n(a;, y) 
denote the message transcript when Alice and Bob run protocol 11 on input pair {x,y). We sometimes 
abuse notation by identifying the protocol and the corresponding random transcript, as long as there is no 
confusion. 

The communication complexity of a protocol is defined as the maximum number of bits exchanged 
among all pairs of inputs. We say a protocol IT computes / with error probability 6 {0 < 6 < I) if there 
exists a function g such that for all input pairs (x, y), Pr[gi(n(x, y)) ^ f{x, y)] < 6. The (^-error randomized 
communication complexity, denoted by i?5(/), is the cost of the minimum-communication protocol that 
computes / with error probability 6. 

The definitions for two-party protocols can be easily extended to the multiparty setting, where we have k 
players and the i-th player is given an input G A'j. Again the k players want to jointly compute a function 
f{xi,X2, • • • , Xk) by exchanging messages according to a protocol IT. 

Information complexity Information complexity was introduced in a series of papers including [8, 17]. We 
refer the reader to Bar-Yossef's Thesis [7]; see Chapter 6 for a detailed introduction. Here we briefly review 
the concepts of information cost and conditional information cost for fc-player communication problems. 
All of them ai^e defined in the blackboai^d number-in-hand model. 

Let /i be an input distribution on Afi x x . . . x Af^ and let X be a random input chosen from /j,. Let 
n be a randomized protocol running on inputs in Xi x X2 x . . . x Xj^. The information cost of 11 with 
respect to fi is I{X; 11) ^. The information complexity of a problem / with respect to a distribution fi and 
error parameter 6 (0 < 6 < 1), denoted IC^ ,5(/), is the information cost of the best 6-enor protocol for / 
with respect to f-i. We will work in the public coin model, in which all parties also share a common source 
of randomness. 

We say a distribution A partitions fi if conditioned on A, is a product distribution. Let X be a random 
input chosen from fi and Z) be a random variable chosen from A. For a randomized protocol II on Xi x X2X 
. . . X Xk, the conditional information cost of IT with respect to the distribution on A'l x x . . . x 
and a distribution A partitioning fi is defined as I{X; 11 | D). The conditional information complexity of a 
problem / with respect to a distribution fi, a distribution A partitioning /x, and error parameter 6 (0 < S < 1), 
denoted IC^^5(/| A), is the information cost of the best (5-error protocol for / with respect to fi and A. The 
following proposition can be found in [8]. 

*In some of the literature this is called the external information cost, in contrast with the internal information cost. In this paper 
we only need the former. 
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Proposition 2 For any distribution /i, distribution A partitioning ji and error parameter 5 (0 < 5 < 1), 

RsU) > IC;.,5(/) > IC^.sif I A). 

Statistical distance measures Given two probabilistic distributions /i and v over the same space X, the 
following statistical distributions will be used in this paper. 

def 

1. Total variation distance: V{iJi,v) = max^cA' lAi(^) — '^(^)l- 



2. HeHinger distance: i/) = y i Y^xax yV f^i^) ~ V^i^)) 
We have the following relation between total variation distance and Hellinger distance (cf. [7], Chapter 2). 
Propositions h?{^,u) < y(;U,z/) < h{^, v)^j2 — K^{[i, v). 

Conventions In the rest of the paper we call a player a site, as to be consistent with the distributed functional 
monitoring model. We denote [n] = {1, . . . , n}. Let © be the XOR function. All logarithms are base-2 
unless noted otherwise. We say is a (1 + e) -approximation of W , ^ < e < \,'\iW <W < (\ ^ s)W , 
and we say W is an e-approximation oiW iiW — e <W <W + e. That is, the former has multiplicative 
error and the later has additive error. 

3 A Lower Bound for Fq 

We introduce the problem A;-GAP-MAJ, and then compose it with 2-DISJ to prove a lower bound for Fq. 
3.1 The fc-GAP-MAJ Problem 

In the /c-GAP-MAJ problem we have k sites 5i, ^2, . . . , Sfc, and each has a bit Zi(\ <i < k). They want 
to compute the following function in the blackboard model. 

r 0, if E.e[fc] z^<(3k- 
k-GAF-MA]{zi,Z2, ...,Zk) = l 1, if Eie[fc] Zi>Pk + ^/pk, 

*, otherwise, 

where /3 {uj{l/k) < P < 1/2) is some fixed value, and "*" means that the answer can be aitttrary. We 
define the input distribution fi as follows. For each i ^ [k], let Zi = I with probability /3 and Zi = with 
probability (1 — 

Let Z = {Zi, Z2, ■ ■ ■ , Zk}he a random input chosen according to distribution ^i. Let IT be the transcript 
of any protocol for /c-GAP-MAJ on the random input vector Z. Let fi be the probability distribution of the 
random transcript IT. 

Definition 1 We say a transcript vr is weak if for at least 0.5k of Zi {i G [k]), it holds that H{Zi \ 11 = 
71") > ifb(0.01/3), otherwise we say it is strong. 

In this section we will prove the following main theorem for A;-GAP-MAJ. Intuitively, it says that in 
order to correctly compute fc-GAP-MAJ with a good probability, we have to learn Q.{k) Zj's well. 
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Theorem 1 If a protocol correctly computes k-GAP-MAJ on input distribution fx with error probability S 
for some sufficiently small constant 6, then Prn~/i [n is strong] = 

We have the following immediate corollary, which will be used to prove the lower bound for the quantile 
problem in Section 6.1. 

Corollary 1 Suppose that (3 = 0(1), then I{Z; H) = Q{k)for any protocol that computes k-GAP-MAJ on 
input distribution n with error probability 5 for some sufficiently small constant 5. 

Proof: By the chain rule and independence, we have 

I{Z-U) > Y.I{Zf,U) 

ielk] 

> J2 ( P''n-/i[n = T^]Y1 (^(^i) - I n = ^)) 

7r:7r is strong \ i6[fc] 

> 0(1) • 0.5A: • {Hb{(3) - i/b(0.01/3)) 

> n{k) (for /3 = 6(1)). 



Now we prove Theorem 1. The following observation, which easily follows from the rectangle property 
of communication protocols, is crucial in our proof. 

Observation 1 Conditioned on H, all random variables Zi, Z2, ■ ■ ■ , Z^ are independent. 
Let ci be a constant chosen later. We introduce the following definition. 

Definition 2 (Goodness of a transcript) We say a transcript vr is bad"^ ;/ E Zi\Ii = n > (3k + 

ci-y/^ and bad" if E J2ie[k] Zi\Ii = tt < (3k — ciy/]3k. In both cases we say vr is bad. Otherwise we 
say it is good. 

We first show that a transcript is bad only with a small probability. 

Lemma 1 Prn~/i[n is bad] < 2e-(^i-i)'/V(l - e"^/^). 

Proof: Set C2 = ci — 1. We say Z = {Zi,Z2, ■ ■ ■ , Z^} is a joker'^ if J2ie[k] ^ /^^ + C2y/Pk, and a 
joker' if '^Zieyk] Zi < (3k — C2^/]3k. In both cases we say Z is a joker. 

First, by applying a Chernoff bound on random variables Zj for z = 1 , . . . , A; we have that 

Pr[Z is a joker+] = Pr \y2i&[k\ Zi>(3k + 02^] < g-^i/^. 
Second, by applying a Chernoff bound on random variables Zj for z = 1 , . . . , A; conditioned on 11 is bad, 

Pr[Z is a joker | 11 is bad"*"] 
> ^ Pr [n = vr I vr is bad+] Pr [Z is a joker+ | H = vr, vr is bad+] 
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= ^ Pr [n = vr I TT is bad+] Pr 

7T 

> ^ Pr [n = vr I vr is bad+ 



Z^ > 13k + C2^/M 

_p-(ci-C2)V3' 



IT 



> I3k + ciV/?A;,n = vr 



1 _ e-(ci-c2)V3 



Finally by B ayes' theorem, we have that 

Pr[Z is a joker^] • Pr[n is bad"*" | Z is a joker ^ 



Pr[n is bad^ 



Pr[Z is a joker''' | 11 is bad*^ 



< 



1 



-(ci-C2)2/3' 



Similarly, we can also show that Pr[nisbad~] < e~^2/3/(i _ e-{ci-c2)V3) Therefore Pr[nisbad] < 
2e-^i/3/(l - e-(^i-^2)'/3) ^ 2e-(^i-^)'/V(l - e~^/^) (recall that we set C2 = ci - 1). ■ 



Our next lemma indicates that if a transcript vr is good and weak, then the sum of Zi 's will deviate from 
its mean considerably with a significant probability. Let C3 be a constant chosen later. 

Lemma 2 For a good and weak transcript vr, there exists a universal constant c such that 

[Ei6[fc] Z,<pk- (C3 - ci)V^ I n = vr] > c • 6-100(^3+1)', 
Pr E.ie[fc]^i>/3^ + (c3-ci)^/^ H = tt] > g • e-^oo^^^+i)' 



and 



Proof: We only need to prove the first inequality. The proof for the second inequality is the same. 

Since vr is weak, we can find a set T S [n] with \T\ = Q.hk, such that for any i e T we have H{Zi \ IL = 
71") > //^(O.Ol/?). Let A^i = Eier ■^i ^"^^ -^2 = Eie[fc]\T '^4 ^'^^ '^5 ^^^^ C5 — C4 = C3 be constants 

chosen later. The idea of the proof is to show that conditioned on 11 = vr, A^2 will concentrate around 
E [A^2 I n = vr] within c^^-^fpk with a good probability, while A^i will deviate from E [A'^i | 11 = vr] by at 
least c^y/pk with a good probability, therefore Et6[fc] = -^1 + -^2 will deviate from its mean by at least 
(c5 — c/Cj\/^k = csy/^ with a good probability. Here we use the fact that A''i and are independent 
random variables conditioned on IT = vr. 

To show that N2 will concentrate ai^ound its mean, we use a Chernoff bound. Since vr is good, we have 
by the definition of the goodness of a transcript that E[A'^2 | 11 = vr] < E[Z ] IT = vr] < Pk + Cl^/]3k < 2/3 /c. 
Thus by a Chernoff bound. 



Pr[iV2 - E[7V2 I n = vr] < -C4V^ | H = vr] < e 



213k 



(1) 



To show that A^^i will deviate from its mean considerably, we prove an anti-concentration property of the 
distribution of A'^i conditioned on 11 = vr. We need the following result which is an easy consequence of 
Feller [26] (cf. [44]). 

Lemma 3 ([44]) Let Y be a sum of independent random variables, each attaining values in [0, 1], and let 
a = V'Var[y] > 200. Then for all t G [0, aVlOO], we have 

Pr[Y > E[Y] +t]>c- e-*'/(3<x2) 

for a universal constant c > 0. 
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Since for each i G T it holds that H{Zi | n = vr) > i?b(0.01/3), we have Var(Zi | n = vr) > 
0.01/3(1 - 0.01/3) > 0.009/3. Since conditioned on n = vr, the Zj's are independent, we have \/ar{Ni \ IT = 
tt) > 0.009/3 • 0.5A; > 0.004/3/c. By Lemma 3 we have for some universal constant c. 



Pr[iVi > E[A^i I n = vr] + csV^ | n = vr] > c • e 3.o.oo4/3fc > c . g-^^^^s. 
Set C4 = 1 and C5 = C3 + l.By(l) and (2) and the fact that vr is good and weak, we obtain 



(2) 



Pr 

> Pr 



j:..^[^]Z,>l3k + {cs-ci)^ 



n 



vr 



EiGffcl - E[Ei6ffcl Zi\n = TT]> CgV^ 



-lOOci 



n 



vr 



> {1 - e c ■ e 

= c.(l-e-i/6)-e-^oo(^3+i)^ 

where c is a universal constant. 

Now we prove our main theorem for fe-GAP-MAJ. 



Proof: (of Theorem 1) First, by Lemma 1 we know that a transcript vr sampled according to fi is good with 
probability ^1 — 2e~('^i~^)^/^/ (1 — e^^/^)^ . Second, conditioned on vr being good, it cannot be weak with 

probability more than 1/2. We show this by contradiction. Suppose that vr is weak with probability at least 
1/2 conditioned on it being good. Set cs — ci = 1, ci = 5 and constant 6 sufficiently small. By Lemma 2, 
we have that the enor probability of the protocol will be at least 



1 - 2e 



-(ci-l)V3 



/(I 



1/2 -c-e 



-100(ci+2)2 ^ ^ 



violating the success guarantee of Theorem 1 . 

Therefore with probability at least 1/2 • (l - 2e^('^i^^)^/^/(l - e"^/^)^ > 0.(1), vr is both good and 
strong (thus strong). We are done. ■ 



3.2 The 2-DISJ Problem 

In 2-DISJ Alice and Bob each have an n-bit vector. If we view vectors as sets, then each of them has a 
subset of [n] corresponding to the 1 bits. Let x be the set of Alice and y be the set of Bob. The goal is to 
return 1 if x n y 7^ 0, and otherwise. 

We define the input distribution r as follows. Let i = (n + l)/4. With probability 1/t, x and y are 
random subsets of [n] such that = \y\ = i and n y| = 1. And with probability 1 — 1/t, x and y are 
random subsets of [n] such that \x\ = \y\ = £ and a; n y = 0. Razborov [46] (see also [37]) proved that 



for i = 4, D 



l/(100f) 



(2-DISJ) = r2(n). It is easy to extend this result to general t since if a protocol solves 



the problem for general t with error 1 / (lOOt) and communication cost o(n), then it also solves the problem 
when t = 4 with error 1/400 and communication cost o(n), contradicting the lower bound for t = 4. 
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3.3 The Complexity of Fq 

We choose the input distribution ( for the (1 + e)-approximate Fq problem as follows. Set n = Aje^ 
where A = 20000/5 is a constant, /3 = l/{ke'^) and t = 1/(3. We start with a set Y with cai^dinality 
£ = (n + l)/4 chosen uniformly at random from [n], and then choose Xi, X2, ■ ■ ■ , according to the 
marginal distribution r | Y independently where r is the hard input distribution for 2-DISJ. We assign 
Xi, X2, ■ ■ ■ , Xk to the k sites, respectively. 

Let Ti = XiDY if \XinY\ / and NULL otherwise. Let N = \{i e [k] \ Ti / NULL}]. Let 
R = Fq{Ti,T2, . . . , Tfc), the following lemma shows that R will concentrate around its expectation E[R\, 
which can be calculated exactly. 

Lemma 4 With probability at least (1 - 6500/^), we have \R - E[R]\ < l/(10e), where E[R] = (1 - X)N 
for some fixed constant < A < 4/A 

Proof: We can think of our problem as a bin-ball game: those Tj (i G [A;])'s that are not NULL are balls 
(thus we have balls), and elements in the set Y are bins (thus we have I bins). We throw each of the 
balls into one of the I bins uniformly at random. Our goal is to estimate the number of non-empty bins at 
the end of the process. 

By a Chernoff bound we have that with probability at least (l — e"^^^'^)) = 1 — o(l), we have N < 
213k = 21 e^. By Fact 1 and Lemma 1 in [39] we have E\R\ = £ (l - (1 - 1/^)^) and Var[i2] < m'^jl. 
Thus by Chebyshev's inequality we have 

Let (9 = Nji < 8/^. We can write 

E[i?] = £ (1 - e-^) + 0(1) = 0^ (^1 - 1 + ^ - + 0(1). 

This series converges and thus we can write E[i2] = (1 — X)Ql = (1 — A)A^ for some fixed constant 
0<A<6'/2<4/A ■ 

The next lemma shows that we can use a protocol for Fq to solve /c-GAP-MAJ with good properties. 

Lemma 5 If there exists a protocol V' that computes a {1 + a£)-approximation to Fq (for some sufficiently 
small constant a) on input distribution C, with error probability 5/2, then there exists a protocol V that 
computes the k-GAP-MAJ problem on input distribution fi with error probability 6. 

Proof: We first describe the construction of V using V' and then show its correctness. 

Protocol V . Given a random input Z = {Zi, Z2, Zj.} of A;-GAP-MAJ chosen from distribution fi, 
we construct an input {Xi,X2, ■ ■ ■ , X^) of Fq as follows: We first choose y to be a subset of [n] of size i 
uniformly at random. Let /f , /g, . . . , be random subsets of size £ from [n] — Y, and /f~^, ^2~^, • • • , 
be random subsets of size {£ — 1) from [n] — Y. Let /j^, , . . . , be random elements from Y. We next 
choose 
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It is easy to see that {Xi, X2, ■ ■ ■ , Xk,Y) is chosen from distribution (. 

Protocol V first uses V' to compute W which is a (1 + ae)-approximation of Fq{Xi,X2, ■ ■ ■ , Xk), and 
then determines the answer to /c-GAP-MAJ as follows. 



A:-GAP-MAJ(Zi,Z2,...,Zfe) 



1, if '^^>l/e^[=pk\ 
0, otherwise. 



Recall that we set n = A/e^, £ = (n + l)/4 and 0<A<4/Ais some fixed constant. 

Correctness. Given a random input (Xi, X2, . . . , Xk,Y) chosen from distribution C,, the exact value of 
W = Fq{Xi,X2, . . . , Xk) can be written as the sum of two components. 

W = Q + R, (3) 

where Q is a random variable that counts Fo(Ujg[fc]Xj\y), and i? is a random variable that counts Fo(Ujg[fc]Xj P| Y). 
First, by our construction it is easy to see that with probability (l — • e"^^'^^) = 1 — oil), we have 
Q = |{[^] ~ ^}l = n — £. Since each element in {[n] — Y} will be chosen by every Xi {i = 1,2, . . . , k) 
with probability more than 1/4. Second, by Lemma 4 we know that with probability (1 — 6500/^4), R is 
within l/(10e) from its mean (1 — A)A^ for some fixed constant < A < A/ A. Thus with probability 
(1 — 6600/j4), we can rewrite Equation (3) as 

W = {n -£) + {!- X)N + Ki, (4) 

where \ki\ < l/(10e). 

Since Fq{Xi,X2, ■ ■ ■ ,Xk) computes a value W which is a (1 + ae) -approximation of W, we can 
substitute W with W in Equation (4), resulting in the following. 

W = in-i) + {l-X)N + Ki + K2, (5) 

where K2 < ae ■ Fo(Xi, X2, . . . , Xf^) < aA/e. We can choose a = 1/{10A) to make K2 < l/(10e). Now 
we have 

N = {W-in-i)-Ki-K2)/{l-X) 

= {W -{n-i))/{l- X) + K3, 

where |k3| < (l/(10e) + l/(10e))/(l - 4/ A) < l/(4e). Therefore {W - {n - i))/{l - X) estimates 
N = Zi con^ectly up to an additive eiTor l/(4e) < y/^k = 1/e, thus computes A;-GAP-MAJ 

correctly. The total eiTor probability of this simulation is at most {6/2+6600/ A), where the first term counts 
the error probability of V' and the second term counts the error probability introduced by the reduction. This 
is less than 6 if we choose A = 20000/(5. ■ 

From Theorem 1 we know that if a protocol computes A;-G AP-M AJ {Zi, Z2, ■ ■ ■ , Zf^) correctly with error 
probabihty 6, then with probabihty $7(1), for at least 0.5A: Z^'s we have H{Zi | 11 = vr) < Hh{0.01/3). This 
is equivalent to the following: With probability 0(1), the protocol has to solve at least 0.5A; copies of 2- 
DISJ(Xi,y) (i E [k]) on input distribution r each with eiTor probability at most 0.01^5 = l/(100t). By 
the lower bound for 2-DISJ on input distribution r we know that solving each copy requires 0(l/e^) bits 
of communication (recall that we set n = A/e'^ for a constant A), thus in total we need 0(/c/e^) bits of 
communication. 

Theorem 2 Any protocol that computes a (1 + e) -approximate Fq on input distribution C, with error prob- 
ability 5 for some sufficiently small constant 5 has communication complexity Q.{k/e'^). 
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4 A Lower Bound for Fp {p > 1) 

We first introduce a problem called A;-XOR which can be considered to some extent as a combination of two 
/c-DISJ (introduced in [4, 8]) instances, and then compose it with 2-GAP-ORT (introduced in [47]) to create 
another problem called /c-BLOCK-THRESH-XOR (/c-BTX) problem. We prove that the communication 
complexity of A;-BTX is large. Finally, we prove the communication complexity lower bound for Fp by 
performing a reduction from k-BTX. 

4.1 The 2-GAP-ORT Problem 

In the 2-GAP-ORT problem we have two players Alice and Bob. Alice has a vector x = {xi,X2, ■ ■ ■ ,Xi/^2} G 
{0, 1}^/^^ and Bob has a vector y = {yi,y2, ■ ■ ■ , Vi/e^} ^ {Oi 1}^^^^- They want to compute 



2-GAP-0RT(x, y) 



E,e[i/,2]XOR(x„y,)-l/(2e') 
Z^e[l/e^] X0R(x„y,)-l/(2e') 



1, 
0, 

*, otherwise. 



>2/e, 



Let (/) be the uniform distribution on {0, 1}^^^^ x {0, 1}^^^^ and let {X, Y) be a random input chosen from 
distribution (/). 

We assume that the communication cost of all protocols in the paper is at most poly(A^), where N is the 
number of coordinates in the vector inputs to the parties. This assumption is fine for our purposes because 
we will show in Section 4.4 that a /c -party protocol V for F2 implies a 2-party protocol V' for 2-GAP-ORT 
with asymptotically the same communication. Thus if V' has communication cost larger than poly(A^), then 
we obtain an Q{poly{N)) lower bound for the communication cost of F2 immediately. 

Theorem 3 Let IT be the transcript of any protocol for 2-GAP-ORT on input distribution cj) with error 
probability l, for a sufficiently small constant i > 0, and assume li uses at most poly(A^) communication. 
Then, I{X,Y;I[) > 9.{l/e^). 

Proof: Sherstov [47] proved that under the product uniform distribution (f), any protocol that computes 2- 
GAP-ORT correctly with error probability l for some sufficiently small constant t > has communication 
complexity Q{l/e^). By Theorem 1.3 of Barak et. al. [9] which says that under a product distribution, if the 
communication complexity of a two-player game is at most poly(t), then the information cost of the two- 
player game is at least the communication complexity of the two-player game up to a factor of poly log(i), 
wehave/(X,y;n) > i^(l/e2). ■ 

4.2 The A^-XOR Problem 

In the fc-XOR problem we have k sites 5i, 5*2, . . . , S^- Each site 5j (i = 1, 2, . . . , /c) holds a block hi = 
bi,2, ■ ■ ■ 1 ^i,n} of n (n > /c^+^(^)) bits. Let b = (61, 62, ... , bk) be the union of the inputs of k sites. 
We assume A; > 4 is a power of 2. The k sites want to compute the following function in the blackboard 
model. 

, /, , \ f 1, if there exists a coordinate 7 G \n] such that 6,- = 1 for exactly k/2 i's, 

^ ' ' ' \ 0, otherwise. 
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We define the input distribution for the k-XOR problem as follows. For each coordinate i {£ £ [n]) 
there is a variable Dg chosen uniformly at random from {1,2,..., k}. Conditioned on D^, all but the Dg-th 
sites set their inputs to 0, whereas the D£-th site sets its input to or 1 with equal probability. We call the 
D£-th site the special site in the £-th coordinate. Let tpi denote this input distribution on one coordinate. 

Next, we choose a random special coordinate M S [n] and replace the k sites' inputs on the Af-th 
coordinate as follows: for the first k/2 sites, with probability 1/2 we replace all k/2 sites' inputs with 
and with probability 1/2 we replace all k/2 sites' inputs with 1; and we independently perform the 
same operation to the second k/2 sites. Let tpi denote the distribution on this special coordinate. And let 
Tpn denote the input distribution that on the special coordinate M is distributed as tjji and on each of the 
remaining n — 1 coordinates is distributed as (pi . 

Let B,Bi, Bi^i be the corresponding random variables of 6, 6j, bi^£ when the input of A;-XOR is chosen 
according to the distribution ipn- Let D = {Di, D2, ■ ■ ■ , Dn}. Let X = lif the inputs of the first k/2 sites 
in the special coordinate M are all 1 and X = otherwise. Let y = 1 if the inputs of the second k/2 
sites in the special coordinate M are all 1 and Y = otherwise. It is easy to see that under ipn we have 
k-XOR{B) = X ®Y. We say the instance S is a 00-instance if X = Y = 0, a 10-instance if X = 1 and 
y = 0, a 01-instance if X = and y = 1, and a 11-instance if X = Y = 1. Let S e {00, 01, 10, 11} be 
the type of the instance. 

Theorem 4 Let H be the transcript of any protocol on input distribution ipn for which I{X, Y; 11) = ri(l). 
Then, I{B; H | M, D, S) = (l{n/k), where information is measured' with respect to the input distribution 

Proof: Since /(X, y; n) = 0(1), we have 

0(1) = I{X,Y;I[) = H{X,Y) - H{X,Y\Yi) = 2- H{X,Y\Yi), 

or H{X, y|n) = 2 — 0(1). By the Maximum Likelihood Principle in Proposition 1, there is a reconstruction 
function g from the transcript of IT for which the error probability 5g satisfies 



1 _ 1 ^ _ 2^ ^ 1 + 0(1) _ 3 

2H{x,Y\n) - 22-^(1) 4 4 4 



and therefore the success probability of the reconstruction function g over inputs X, y is ^ + 0(1). Since g 
is deterministic given the transcript H, we abuse notation and say the success probability of 11 is ^ + 0(1). 

For an ^ G [n], say I is good if Pr[n(5) = (X, y)|M = ^] = 1/4 + 0(1). By averaging, there are 
0(n) good £. 

By the chain rule, expanding the conditioning, and letting D^^ denote the random variable D with £-th 
component missing, and i?[fc],<^ and -B[A;],£ the inputs to the k sites on the first £ — 1 coordinates and the £-th 
coordinate, respectively, we have 

n 

I{B-Ii\D,S,M) = ^/(S[fc],,;n| A5,M,i?[fc],<,) 

£=1 

> /(i?[fc],,;n|Z),5,M,i?[fc],<,), 
good£ 



'when we say that the information is measured with respect to a distribution a we mean that the inputs to the protocol are 
distributed according to a when computing the mutual information. 
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which is 



E, 



b4 



I{B[k]fi n I D,, S, M, = d, i?[fc],<, = b) 



good 



Say a pair (6, d) is good for a good £ if 

Pr[U{B) = {X, Y)\M = £, D^' = d, i?[fc],<, = b] = 1/4 + 
By a Markov argument, 

Pr[{b,d) is good] = n{l). 
We therefore have that I{B; IT | L>, S*, M) is at least 

n{l) ^ I{B^k]/, n I A, M, Z)-^ = i?[fe],<, = b, (6, is good). 

good £ 

Now define a protocol 11^^ which on input Ai, . . . ,Af^ distributed according to ipi, attempts to output 
{U, V), where U = lif Ai = ... = Ai^/2 = 1 and [/ = otherwise, and V = 1 if 74^/2+1 = ■ ■ ■ = Ak = I 
and V = otherwise. The protocol Ilh^d has b and d hai^dwired into it. It fills in the inputs for coordinates 
/ > £ by using the value d and the fact that the inputs to the parties are independent conditioned on D^^ = d. 
It fills in the inputs for coordinates £' < i using the value b. This can all be done with no communication. 
Since £ is good and (6, d) is good for £, it follows that Pr[nfe,rf(Ai, ...,Ak) = {U, V)] = I + 

Hence, I{B; Il \ D,S,M) = n{n) ■ I{Ai, . . . , Ak;Il' \ R, S, M), where 11' is a (randomized) protocol 
which succeeds in outputting ([/, V) with probability 1/4 + 0(1) when Ai, . . . ,Ak are distributed as in 
ijji, and R G [k] is chosen uniformly at random and independently of 5, M, Ai, . . . , A^, and the private 
randomness of 11'. The information is measured with respect to the marginal distribution of ijjn on a good 
coordinate £. Observe that 

I{Ai,...,Ak;Ii'\R,M,S) = -^ ^ /(^i, . . . , A^; n' | ii, M, 5 = s), 

se{oo,oi,io,ii} 

and so 

I{B;Il\D,S,M) > n{n) ■ I{Ai,...,Ak;U' \ R,M,S = 00). 

Let £ be the event that all sites have the value in the M -th coordinate when the inputs ai^e drawn from tpn- 
Observe that {ipn\£) = (V'nI'S' = 00) as distributions, and so 

/(-B;n I D,S,M) > n{n)- I{Ai,...,Ak;n' \ R,M,£), 

where the information on the left hand side is measured with respect to inputs B drawn from i/^n, and the 
information on the right hand side is measured with respect to inputs Ai, . . . ,Ak drawn from tpn- Observe 
that Pr[M = £] = 1/n, and so 

~ Tl — 1 

I(B;U \ D,S,M) > n(n)-I(Ai,...,Ak;U'\R,M^£,£) 

n 

> n{n)-I{Ai,...,Ak;U' \R,M ^£,£). 
By definition of the mutual information, and using that , . . . , are independent of £ given M ^ £, 

I{Ai,...,Ak;I]!\R,M ^£,£) = H{Ai, . . . , A^ \ R, M ^ £,£) 



15 



-H{Ai,...,Ak I I\',R,M^l,£) 
> H{A,,...,Ak\R,M ^i)-HiAi,...,Ak\U',R,M ^i) 
= I{Ai,...,Ak;U'\R,M y^l). 

Notice that I{Ai, . . . , Ak -,11' \ R, M ^ £) is equal to I{Ai, . . . ,Ak;Il' \ R) where the information is 
measured with respect to the input distribution ipi, and 11' is a protocol which succeeds with probability 

1/4 + 0(1) on Vi- 

It remains to show that I{Ai, . . . , A^^Il' \ R) = fl(l/fc) where the information is measured with 
respect to ipi. Let be the all-0 vector, 1 be the all-1 vector and ei be the standard basis vector with the i-th 
coordinate being 1. By the relationship between mutual information and Hellinger distance (see Proposition 
2.51 and Proposition 2.53 of [7]), we have 

I{A,,...,Ak;U'\R) = {l/k)Y,HM,...,Ak;U'\R = i) 

= ^^(iA)E.6[fc]/i'(n'(o),n'(ei)), 

where h{-, •) is the Hellinger distance (see Section 2 for a definition). Now we assume k and k/2 are powers 
of 2, and we use Theorem 7 of [36], which says that the following three statements hold: 

1- E^m ^'(n'(o),n'(ei)) = n{i) • hHu'{o),u'{i'^/^o''/^)) 

2- E^m h\n'{o),u'{e,)) = o(i) • h^{u'{o),u'{o'^/H''/^)) 

3- E^m /i^(n'(o),n'(eO) = n{i) • h^{u'{o),u'{i)) 

It follows that 

I{Ai, Ak] Il'\R) = Q{l/k)-(^h\U' {0),U' {I'^/^O''/^)) + /i2(n'(0), n'(0^/2i^/2^) ^ h\U'{0),U'{l))'^ . 
By the Cauchy-Schwartz inequality we have, 

i{Ai, ...,Ak;U'\R) = f^(i/A;)- (/i(n'(o), n'(i^-/2o'=/2)) + h{n'{o),u'{o''/H''/^)) + /i(n'(o), n'(i))) ^ . 

We can rewrite this as 

I{Ai, ...,Ak;U'\R) = Q{l/k)-(3h{U'{0),U'{l''/^0''/^)) + 3h{U' {0),U' {O'^/H''/^)) + 3/i(n'(0), n'(l))) ^ 

Now by the triangle inequality of Hellinger distance (which is just the Euclidean norm of the so-called 
transcript wave function, see [36]), we obtain the following, 

I{Ai, ...,Ak;U'\R) = n{l/k) • (Ea,be{o, i, i^/^o^/^, 0^/21^/2} /i(n'(a), n'(6)))' (6) 

The claim is that at least one of /i(n'(a), n'(6)) in the RHS in Equation (6) is 17(1), and this will complete 
the proof. By Proposition 3, this is true if the total variation distance between IT' (a) and n'(6) is ^{l) for 
an a, 6 € {0, 1, I'^/^o'^/^^ g'^/^l'^/^}, and there must be such a pair (a, b), as otherwise IT' cannot succeed 
with probability 1/4 + 17(1) on distribution t/ji (since it cannot distinguish different outputs), violating its 
success probability guarantee. ■ 
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4.3 The A^-BTX Problem 



The input of the k-BTX problem is a concatenation of copies of inputs of the A;-XOR problem. That 
is, each site Si {i = 1, 2, . . . , fc) holds an input consisting of 1/e^ blocks each of which is an input for a 
site in the /c-XOR problem. More precisely, each Si {i G [k]) holds an input bi = {bj, bf,..., } where 
y. = {bi-^,hP-2, . . . , {j G [1/e^]) is a vector of n (n > fc^+^^^i)) bits. Let 6 = 62, • • • , bk} be the 
union of the inputs of k sites. In the fc-BTX problem the k sites want to compute the following. 



fe-BTX(6i,...,6fc) = < 



1, if 

0, if 
*, otherwise. 



> 2/e, 



We define the input distribution u for the fc-BTX problem as follows: the input of the k sites in each block 
is chosen independently according to the input distribution ipn, which is defined for the A;-XOR problem. Let 
B, Bi, Bj , Bj ^ be the corresponding random variables of b, 6j, 6;?, 6^^ when the input of fc-BTX is chosen 

according to the distribution u. Let = • • • ) ^n} where Dj {£ S [n\,j G is the special 

site in the ^th coordinate of block j, and let D = {D\ D"^ , . . . , D^/^^}. Let M = {M^, M^, . . . , M^^'} 
where AP is the special coordinate in block j. Let 5 = {5\ 5^, . . . , 5^/^'} where S^ G {00, 01, 10, 11} is 
the type of the /c-XOR instance in block j. 

For each block j {j G [1/e^]), let = 1 if the inputs of the first k/2 sites in the special coordinate 
are all 1 and = otherwise; and similarly let = 1 if the inputs of the second k/2 sites in the coordi- 
nate are all 1 and = otherwise. Let X = {X^ , X^ , . . . , XVe'} and Y = {Y\Y'^ , . . . , Y^/"^}. 
We first show the following theorem. 

Theorem 5 Let n be the transcript of any protocol for k-BTX on input distribution v with error probability 
b for a sufficiently small constant 5 > 0. Then I{X, Y;Il) = Cl^l/e"^), where the information is measured 
with respect to the uniform distribution on X, Y. 

Proof: Consider the following randomized 2-player protocol 11' for 2-GAP-ORT, where the eiTor proba- 
bility is over both the coin tosses of 11' and the uniform distribution 4> on inputs {X,Y). Alice and Bob 
run n, with Alice controlling the first k/2 players, and Bob controlling the second k/2 players. Alice and 
Bob use the public coin to generate AP and values for each j G [1/e^]- For each j G Alice 
sets the AP-th coordinate of each of the first k/2 players to Xj. Similarly, Bob sets the AP-th coordinate 
of each of the last k/2 players to Yj. Alice and Bob then use private randomness and the vectors to fill 
in the remaining coordinates. Observe that the resulting inputs are distributed according to u for fe-BTX by 
definition of u and the fact that {X, Y) is uniformly distributed. 

Alice and Bob run the deterministic protocol 11. Every time a message is sent between any two of the k 
players in 11, it is appended to the transcript. That is, if the two players are among the first k/2, Alice still 
forwards this message to Bob. If the two players are among the last k/2. Bob still forwards this message to 
Alice. If the message is between a player in the first group and the second group, Alice and Bob exchange 
a message. The output of II' is equal to that of 11. Let raiid denote the randomness used in 11', which since 
n is deterministic, is just the randomness used to help create the inputs to 11. Note that rand consists of 
public and private randomness. Let 11'^^^^ (X, Y) denote the induced deterministic protocol we obtain by 
hardwiring rand. 
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By a Markov argument if H succeeds with probability at least 1 — 6, then for at least a 1/2 fraction of 
choices of rand, Prx,y [n;^„^(X, Y) = 2-GAP-0RT(X, Y)] > 1 - 26. By definition, 

I{X,Y;U{X,Y)) = I{X,Y;U'{X,Y,rand)), 

where rand is not included in the transcript of U'. By definition of the mutual information, 

I{X,Y-U'{X,Y,rand)) = B^and [En;„„,{x,y) [DKL{piX,Y\U',,^a{X,Y)) \\p{X,Y))] 

where Dkl{p, q) is the KL-divergence between distributions p and q, and p{V) for a random variable V 
denotes its distribution. By a Markov argument, for at least a 2/3 fraction of random strings rand, 

/(X,y,;n;,„,(X,y)) = ^w^^ixx) [Dkl{p{X,Y\K,^,{X,Y)) \\p{X,Y)] <3-I{X,Y-UiX,Y)). 

By a union bound, there exists a setting of rand for which we have 

Pr[n;,„,(X, Y) = 2-GAP-0RT(X, Y)] > 1 - 26, (7) 



and 

I{X, Y; U',,^,{X, Y)) < 3I{X, Y; U{X, Y)). (8) 

Since H^^^^ is deterministic, it follows by (7) and Theorem 3 that I{X, Y; ^.^ndi^^ ^)) = and 

hence by (8), I{X,Y;U{X,Y)) = n{l/e^), which completes the proof. ■ 



Now we are ready to prove our main theorem for fc-BTX. 



Theorem 6 Let IT be the transcript of any protocol for k-BTXon input distribution v with error probability 
6 for a sufficiently small constant 6 > ^. We have I{B; IT | M, D, S) > ^{n/ {ke^)) for any n > k^~^^^^\ 
where the information is measured with respect to the input distribution v. 

Proof: By Theorem 5 we have I{X, Y; U) = Cl{l/e^). Using the chain rule we obtain that 

I{X^,Y^;U\X<^,Y<^) = n{l) 

for at least il(l/e^) j's, where X^^ = {X^, X^, . . . , X^~^} and similaiiy for Y^^. We say such a j for 
which this holds is good. 

Now we consider a good j G [1/e^]. We show that I{B^;Il \ M, D, S, B^^) = Q{n/k) if j is good. 
Since B^^ determines (X^-', y^-') and S^-' is independent of B^, by the third part of Proposition 1, it 
suffices to prove that I{B^;Il \ M,D,S,X'^^ ,Y^'j) = Q{n/k). By expanding the conditioning, we can 
write I{B^;n \ M, D, S, X<^ ,Y<^) as 

Ern,d,s,x,y[I{B^;Tl \ , , , M'^ = m, D-^ = d, S-^ = s, X<^ = x, Y<^ = y)], 

For each m, d, s, x, y, we define a randomized protocol Hm,d,s.x,y for computing X^ , Y^ on distribution 
Suppose the k sites are given inputs ai, a2> • • • , Ofc chosen randomly according to -i/'n- For each i e [k] the 
i-th site sets Bl = a,. The k sites set the remaining inputs as follows. Independently for each block j' / j, 
conditioned on , and , the k sites sample the input B^ randomly and independently according to 
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Tpn, using their private random coins (note that S ^ determines X^^ and y^-'). Finally the k sites run 11 on 
B and define 

^m,d,s,x,yiai, ■ ■ ■ , Ofc) = 11(5). 

By the definition of a good j G we know by a Markov bound that with probability 0(1) over the 

choice of {x, y) from the uniform distribution, if (X^-^ , y^-') = (x, y) then we have 

I{X^,Y^-I{\ X<^ = x,Y<'J =y) = fi(l). 

Call these (x, y) for which this holds good. Now for a good pair (x, y), we say a tuple (m, d, s) is gooJ if 

J(X<^ y<J; n I M"^' = m, Z)-^' = d, = s, X<^' = x, Y<^ = y) = ^{l). 

Since /(X^-' , y ^-^ ; 11 | X^^ = x, y^-' = y) = ^^(1) for a good pair (x, y), by another Markov bound we 
have that Pr^^ s[(?n, d, s) is good] = f^(l). Combining the above ai^guments with Theorem 4, we obtain 

1(5^ ; n I M,D,S,B<^) 

> I{B^;U I M, D, 5, y<J) 

= ^mAs,x,y[HB';^ I iVP, Z)^ 5^ M-^^ = m, Z)-^^ = fi, = s, X<^ = x, Y<^ = y)] 

> ■ E^,d,s,.,y[I{B^;n\ M\D\S\{M-\D"^,S-n = {m,d,s),{X<\Y<^) = {x,y), 
(m, d, s) is good, (x, y) is good)] 

= n{n/k) (By Theorem 4). 

By the chain rule, the fact that there are J7(l/e^) good j G [1/e^], and part 3 of Proposition 1, 
I{B;U\ M,D,S) > I{B^;U\ M,D,S,B<^) 

je[l/e^]A j is good 

> I{B^;Il\M,D,S) 

iS[l/e^]A j is good 

> n{n/{ke^)). 

This completes the proof. ■ 

By Proposition 2 that says that the randomized communication complexity is always at least the condi- 
tional information cost, we have the following immediate corollary. 

Corollary 2 Any protocol that computes k-BTX on input distribution v with error probability 5 for some 
sufficient small constant 6 has communication complexity (l{n/ {ke^)). 

4.4 The Complexity of (p > 1) 

The input of e-approximate Fp {p > 1) is chosen to be the same as A;-BTX by setting n = k'P. That is, we 
choose {6i, 621 • • • 1 bk} randomly according to distribution u. bi is the input vector for site Si consisting of 
blocks each having 7i = k^ coordinates. We prove the lower bound of Fp by performing a reduction 
from A;-BTX. 
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Lemma 6 If there exists a protocol V' that computes a (1 + ae)-approximate Fp {p > 1) for a sufficiently 
small constant a on input distribution v with communication complexity C and error probability at most 
6, then there exists a protocol V for k-BTX on input distribution v with communication complexity C and 
error probability at most 3(5 + a, where a is an arbitrarily small constant. 

Proof: We pick a random input B = {Bi, B2, ■ ■ ■ , B^} from distribution u. Each coordinate (column) of 
B represents an item. Thus we have a total of • /c^ = /c^/e^ possible items. If we view each input 
vector Bj {i G [k]) as a set, then each site has a subset of [k^ /e^] corresponding to these 1 bits. Let Wq be 
the exact value of Fp{B). Wq can be written as the sum of four components: 

Wo = (K,^ + Q]-l'+(:^ + u)-{k/2r+(^ + v).kP, (9) 



where Q,U,V are random variables (it will be clear why we write it this way in what follows). The first 
term of the RHS of Equation (9) is the contribution of non-special coordinates across all blocks in each of 
which one site has 1. The second term is the contribution of the special coordinates across all blocks in each 
of which k/2 sites have 1. The third term is the contribution of the special coordinates across all blocks in 
each of which all k sites have 1. 

Note that A;-BTX(6i, 62, . . . , is 1 if > 2/e and if |[/| < 1/e. Our goal is to use a protocol V' 
for Fp to construct a protocol V for A;-BTX such that we can differentiate the two cases (i.e., \U\ > 2/e or 
\U\ < 1/e) with a very good probability. 

Given a random input B, let Wi be the exact Fp- value on the first k/2 sites, and W2 be the exact Fp-value 
on the second k/2 sites. That is Wi = Fp{Bi,B2, -8^/2) and W2 = Fp{Bk/2+i, Bk/2+2, Bk)- We 
have 

W, + W2 = (^^ + Q)-P+(^^ + f/)-(W+(^ + F).2.(A:/2f, (10) 

By Equation (9) and (10) we can cancel out V: 

2P-\Wi + W2)-Wo = (2P^^-l)((^y2^ + Q) + (^ + [/)-(fc/2f). (11) 

Let Wq, Wi and W2 be the estimated Wq, Wi and W2 obtained by running V on the k sites' inputs, the 
first k/2 sites' inputs and the second k/2 sites' inputs, respectively. Observe that Wq < {2^ + l)kP /e'^ 
and Wi, W2 < 2kP /e^. By properties of V' and the discussion above we have that with probability at least 
1-3(5, 

2P-^{Wi + W2)-Wo = 2P-^{Wi + W2)-Wo±l3'kP/e, (12) 
where |/3'| < 3(2^ + l)a. 

By a Chernoff bound we have that \Q\ < cik^^'^/e with probability at least 1 — a, where a is an 
arbitrarily small constant and ci < k log^/^(l/(T) for some universal constant k. Combining this fact with 
Equation (11) and (12) and letting W = {2P-'^{Wi + W2) - Wo)/{2P-^ - 1), we have that with probability 
at least 1 — 35 — a, 

2PW 2P + 1 2P/3 
kP 2e2 (2P-i-l)e' ^ ' 
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where \(3\ < 3(2^ + l)a + o(l). 

Protocol V. Given an input B for fe-BTX, protocol V first uses V' to obtain the value W described above, 
and then determines the answer to A;-BTX as follows: 



k-BTX{B) 



1, if 



2PW/kP - {2P + l)/(2e2) > 1.5/e 



0, otherwise. 



Correctness. Note that with probabihty at least 1 — 3(5 — cr, we have \/3\ < 3(2^ + l)a + o(l), where 



a > is a sufficiently small constant, and thus 
always succeed. 



(2P-i-l)e 



< 0.5/e. Therefore, in this case protocol V will 



Theorem 6 (set n = k^) and Lemma 6 directly imply the following main theorem for Fp. 

Theorem 7 Any protocol that computes a (1 + e) -approximate Fp {p > 1) on input distribution u with 
error probability 5 for some sufficiently small constant 5 has communication complexity ^(kP~^ / e^). 



5 An Upper Bound for Fp {p > 1) 

We describe the following protocol to give a factor (1 + 0(e)) -approximation to Fp at all points in time in 
the union of k streams each held by a different site. Each site has a non-negative vector u * G M"\ ^ which 
evolves with time, and at all times the coordinator holds a (1 + 6(e)) -approximation to || X^^Lj^ Let n 
be the length of the union of the k streams. We assume n = poly(m), and that is a power of 2. 

As observed in [20], up to a factor of 0(e~^ lognlog(e~^ log?i)) in communication the problem is 
equivalent to the threshold problem: given a threshold r, with probability at least 2/3: when || J2i=i ^* lip > 
r, the coordinator outputs 1, when || ^^^Hp < t/(1 + e), the coordinator outputs 0, and for r/(l + e) < 
II v^Wp < T, the coordinator can output either or 1^. 

We can thus assume we are given a threshold r in the following algorithm description. For notational 
convenience, define T£ = for an integer i. A nice property of the algorithm is that it is one-way, namely, 
all communication is from the sites to the coordinator. For readability, we do not attempt to optimize the 
poly(e~^ log n) factors in the complexity. 



5.1 Our Protocol 

The protocol consists of four algorithms illustrated in Algorithm 1 to Algorithm 4. Let v = J2i=i ^* 
point in time during the union of the k streams. At times we will make the following assumptions on the 
algorithm parameters 7, B, and r: we assume 7 = @{e) is sufficiently small, and B = poly(e^^ logn) and 
r = 0(logn) are sufficiently large. 

'^We use m instead of N for universe size only in this section. 

'To see tlie equivalence, by independent repetition, we can assume the success probability of the protocol for the threshold 
problem is 1 — 0(e/ log n). Then we can run a protocol for each t = l,(l + £),(l-|-e)^,(l + e)^,..., 0{n^), and we are correct 
on all instantiations with probability at least 2/3. 
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Algorithm 1: Intepretation of the random public coin by sites and the coordinator 

r = 0(logn)/* A parameter used by the sites and coordinator */ 
for 2; = 1, 2, . . . , r do 

for £ = 0, 1, 2, ... , log m do 
|_ Create a set Sf by including each coordinate in [m] independently with probability 2^^. 



Algorithm 2: Initialization at Coordinator 

7 = 0(e), i? = poly(e^-'^ log n). Choose r/ G [0, 1] uniformly at random / * Parameters */ 
for 2; = 1, 2, . . . , r do 

for £ = 0, 1, 2, ... , log m do 
for j = 1, 2, . . . , m do 

|_ fz,e,j^^/* Initialize all frequencies seen to */ 
out / * The coordinator's current output */ 



Algorithm 3: When Site i receives an update v''' ^ v"^ + ej for standard unit vector ej 

for z = 1, 2, . . . , r do 

for £ = 0, 1, 2, ... , log m do 

if j £ SI and v) > T^^/ikB) then 

|_ With probability min(i?/r^^''^, 1), send (j, z, I) to the coordinator 



Algorithm 4: Algorithm at Coordinator if a tuple (j, z, I) arrives 

fz/,j ^ fz,e,j + t]^^ /B 
for /i = 0, 1, 2, ... , log n do 
for z = 1, 2, . . . , r do 

Choose I for which 2^ < „,,, J' „ < 2^"*"^, or ^ = if no such I exists 

[ Let F,^h = {3 G N I fzAj G W + l)\ + 7)'^+')} 
_Ch = median^ 2^ • \Fz^h\ 

if Eh>o • if ■ (1 + if^ > (1 - e)T then 

out ^ 1 

Terminate the protocol 



5.2 Communication Cost 

Lemma 7 Consider any setting of , . . . for which \\ Yl\=i ^ 2^ • r. Then the expected total 
communication is k^^^ ■ poly(e~^ logn) bits. 

Proof: Fix any particular z G [r] and £ G [0, 1, . . . ,logm]. Let equal Vj if j G Se and equal 
otherwise. Let f*'^ be the vector with coordinates for j G [m]. Also let = X^JL]^ v^'^. Observe that 
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Because of non-negativity of the v^, Yli=i Ylj^s^i'^Y^'^ — ^^=1 

.1/p 



^ \\P 



< Notice that a j £ Sp 



is sent by a site with probability at most B/t^"" and only if (v^y > -j^^^- Hence the expected number of 
messages sent for this z and £, over all randomness, is 



E 



B E[||t)^||g] rj/" 2P-n-kP-^-BP _ r,p pp 
- 775' ■ Ti/(kPBP) ' kB — n — ^ ■ h, ■ D , 



where we used that Yl''^] maximized subject to (u* ' 



P > 



kPBP 



and < 



\v \\p when all the Vj 



are equal to T^^^/ikB). Summing over all z and £, it follows that the expected number of messages sent 
in total is O^kP'^B^ log^ n). Since each message is 0(log n) bits, the expected number of bits is k^'^ ■ 
poly(e~^ log n). ■ 



5.3 Correctness 

We let C > be a sufficiently large constant. 



5.3.1 Concentration of Individual Frequencies 

We shall make use of the following standard multiplicative Chemoff bound. 

Fact 1 Let Xi, . . . Xg be i.i.d. Bemoulli{q) random variables. Then for all < f3 < 1, 



Pr 



'^Xi- qs\ > (3qs 



i=l 



< 2 • e — 3-. 



Lemma 8 For a sufficiently large constant C > 0, with probability 1 — n ^('^\ for all z, i, j G Se, and all 
times in the union of the k streams, 



1- fz,e,j ^ 2e • Vj + 



Cry" logn 



B 



and 



2. ifvj > 



C(log5 n)T}/'' 



B"/ 



TTT 



, then 



\fz,£,j -Vj\ < 



log n 



Proof: Fix a particular time in the stream. Let gz,e,j = fz,e,j • B/t'^^^ . Then gz,i,j is a sum of indicator 
variables, where the number of indicator variables depends on the values of the f*. The indicator variables 

are independent, each with expectation min(S/r^^^^, 1). 

First part of lemma. The number s of indicator variables is at most Vj, and the expectation of each is at 
most B/t^^^. Hence, the probability that w = 2e ■ Vj ■ B/tI'^ + C log n or more of them equal 1 is at most 



B 

'JFp 



< 



1/p 



< 



WT, 



C log n 



n 



-c 



This part of the lemma now follows by scaling the gz,i,j by t^^'^ /B to obtain a bound on the /; 
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Second part of lemma. Suppose at this time Vj > ^^lo ^ — . The number s of indicator variables is 
minimized when there are A; — 1 distinct i for which Vj = and one value of i for which 



= Vj - {k - I) 



Zi 

kB ■ 



Hence, 



S > Vj 



If the expectation is 1, then /; 



1) • = Vi 

^ kB kB ^ 

i/p 



[e 

B 



1/p 



)j — and using that Vj > °^^o — establishes this part of 

the lemma. Otherwise, applying Fact 1 with s > Vj ^ > — 2B'y^o ^ — and q = -rj^, and using that 

gs>^|!^,wehave 



Scahng by = i, we have 



Pr 



1 52/ J - 9s| > 



2\og n 



n 



-C(C) 



Pr 



s > 



2 log n 



n 



-n(c) 



and since 



i/p 



< S < Vj, 



Pr 



\js,l,j -Vj\> ^, 2 + ^ 

2 log n -D 



n 



-n(c) 



and finally using that < 2ioJn ' ^^'^ union-bounding over a stream of length n as well as all choices of 
z, £, and j, the lemma follows. ■ 

5.3.2 Estimating Class Sizes 

Define the classes Ch as follows: 

Ch = {j G H I + 7)'' < Vj < 7?(1 + 7)'^+'}. 
Say that contributes at a point in time in the union of the k streams if 



II IIP 



B^l^ log(m/?yP) 

Since the number of non-zero \C}i \ is 0(7~^ log(m/?7P)), we have 

non-contributing h 



(14) 
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Lemma 9 With probability 1 — n ^C^), at all points in time in the union of the k streams and for all h and 
I, for at least a 3/5 fraction of the z G [r], 

\ChnS!\<3-2~'-\Ch\ 

Proof: The random variable \Ch n is a sum of \Ch\ independent Bernoulli(2~^) random variables. By 
a Markov bound, Pr[\Ch r) S^\ < 3 • 2~^|Ch|] > 2/3. Letting be an indicator variable which is 1 iff 
|C/i n 5|| < 3 • 2^^\Ch\, the lemma follows by applying Fact 1 to the X^, using that r large enough, and 
union-bounding over a stream of length n and all h and £. ■ 

For a given Ch, let i{h) be the value of i for which 2^ < ^pf^i^^-^ph^ < 2^^^, or ^ = if no such £ exists. 

Lemma 10 With probability 1 — n~^^^\ at all points in time in the union of the k streams and for all h, for 
at least a 3/5 fraction of the z G [r], 

1. 2W . |C7,,n5|(^)| < 3\Chland 

2. if at this time Ch contributes and \\v\\p > ^, then 2^^^^ ■ \Ch H S^^^^\ = (1 it 7) \Ch\- 

Proof: We show this statement for a fixed h and at a particular point in time in the union of the k streams. 
The lemma will follow by a union bound. 

The first part of the lemma follows from Lemma 9. 

We now prove the second part. In this case \\v\\p > ^. We can assume that there exists an £ for which 
2^ < j^p(i^^^yhB < 2^^^. Indeed, otherwise £{h) = and \Ch H S^^f^-^\ = \Ch\ and the second part of the 
lemma follows. 

Let q{z) = |C/i n 'S'|(^/j)|, which is a sum of independent indicator random variables and so Var[(7(z)] < 
B[q{z)]. Also, 

B[q{z)] = 2-'\Ch\>^^^^^^^-^^^-\Ch\. (15) 

T 

II 11^ 

Since contributes, \Ch\ • rf • (1 + 'yY^ > ^v-!^ j^g^ ' ^^id combining this with (15), 

Eb(„] > ,^11-11^ > 4^. 

It follows that for B sufficiently large, E[g(z)] > and so by Chebyshev's inequahty, 

Pr Mz) - B[q{z)]\ > jE[qiz)]] < < ^■ 

Since E[q{z)] = 2-^\Ch\, and r = @{log n) is large enough, the lemma follows by a Chemoff bound. ■ 
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5.3.3 Combining Individual Frequency Estimation and Class Size Estimation 

We define the set T to be the set of times in the input stream for wliich the Fp-value of the union of the k 
streams first exceeds (1 + 7)* for an i satisfying 

0<i< log(i+^) 2P ■ T. 

Lemma 11 With probability 1 — 0(7), for all times in T and all h, 

1. Ch<^\Ch\+ 37(2 + l){\Ch-i\ + \Ch+i\), and 

2. if at this time contributes and ||f ||p > ^, then 

(1 - 47)|C;,| <Ch<{l + i)\Ch\ + 37(2 + 7)(l^^h-i| + |C,.+i|). 

Proof: We assume the events of Lemma 8 and Lemma 10 occur, and we add n"^^'^) to the error probability. 
Let us fix a class Ch, a point in time in T, and a 2; € [r] which is among the at least 3r/5 different z that 
satisfy Lemma 10 at this point in time. 

C(log^ n)T}lJ' 

By Lemma 8, for any j ^ ChT] S^^j^^ for which vj > ^ ' , if 

I mm{vj - 7]{1 + 7)^ 7]{1 + 7)^^+1 - Vj)\ > ■ vj, (16) 

log n 



then 



c(iog5 n)Ty/; 

3 S Let us first verify that for j G Ch, we have Vj > gytr-^- We have 

and so 

where the final inequality follows for sufficiently large B = poly(e~^ log n) and p > I. 
It remains to consider the case when (16) does not hold. 

Conditioned on all other randomness, t] G [0, 1] is uniformly random subject to Vj G Ch, or equivalently, 

Vj Vj 

i < n < 

^ / - (l+^)/^■ 
If (16) does not hold, then either 

(1-7W»)^,- ^ (l + 7Vlog^n)r;, 

{l + ^)h - /' I- (1 + 7)^+1 • 

Hence, the probability over 77 that inequality (16) holds is at least 



I _ (1+7)'' log" " (1+7)''+' log" n _ _ 7 (2 + 7) 



(l+7)'> (1+7)'^+^ 



log^ n 
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It follows by a Markov bound that 

Pr \\Ch n > \Ch\ ■ (1 - 7(2 + 7))! < r^- (18) 
L ^ ^ J log n 

Now we must consider the case that there is a j' G Ch' n S^^f^^^ for which j' E F^^h for an /i' 7^ h. There are 

C{log5 n)Ty/' C{log^ ^hf/i^-, 

two cases, namely, if Vj/ < ^ ' or if w^/ > ^ ^ . We handle each case in turn. 

Case: vj/ < ^' . Then by Lemma 8, 



< 2e • + . 



Therefore, it suffices to show that 



2e • jgyo + < + ^) ' 

from which we can conclude that / ^ i^z,/i- But by (17), 

where the last inequality follows for sufficiently large B = poly(e~^ logn). Hence, j' ^ Fz,h- 

Case: vji > ^^^g ^ ' . We claim that /i G |/i — 1, h + 1}. Indeed, by Lemma 8 we must have 

5 5 

ri{l + 7)'^ - -\- ■ < Vj, < 7?(1 + 7)^^+^ + • vy. 

log n log n 

This is equivalent to 

^(1+7)^ < < ^(1+7)^+^ 
l + 7^/log^n~ ~1 — 7^/log^n' 

If f e Ch' fork' <h-l, then 

1+7 l+75/log n 
which is impossible. Also, if j' G Ch' for h' > h + 1, then 

Vf > r?(l + 7)^^+' = r?(l + 7)'+' • (1 + 7) > ."^^^tn'r ' 

1 — 7"'/ log n 

which is impossible. Hence, /i' G {/i — 1, /i + 1}. 

Let N,^h = F,^h \ Ch. Then 

E[|iv,,,| < ^^^^^ ■ {\Ch-i n %)| + \Ch+i n (19) 
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By (18) and applying a Markov bound to (19), together with a union bound, with probability > 1 — , 

log TL 

(l-7(2 + 7))-|C/.n5,%)|<|F,,;,| (20) 
< \Ch n + 7(2 + 7) • {\Ch-i n + |C,+i n (21) 

By Lemma 9, 

2^W|C;,_i n < 3|C;,_i| and 2^('^)|C/,+i n < 3|C/,+i|. (22) 

First part of lemma. At this point we can prove the first part of this lemma. By the first part of Lemma 10, 

2^W-|C;,n5|(^)| <3|C;,|. (23) 

Combining (21), (22), and (23), we have with probability at least 1 — ^ „ — n~^'^'^\ 

2'(^)|F,,;,| < ■i\Ch\ + 37(2 + 7)(|C,._i| + \Ch+i\). 
Since this holds for at least 3r/5 different z, it follows that 

Ch < S\Ch\ + 37(2 + 7)(|C/._i| + \Ch+i\), 

and the first part of the lemma follows by a union bound. Indeed, the number of h is 0(7~^ log(m/r/P)), 
which with probability 1 — 1/n, say, is 0(7~^ logn) since with this probability -qP > l/-nP, and always 
iTi < poly(n). Also, \T\ = 0{'^~^ log?i). Hence, the probability this holds for all h and all times in T is 
1-0(7). 



Second part of the lemma. Now we can prove the second part of the lemma. By the second part of 

IP ^ r 
IP - 5' 



Lemma 10, if at this time Ch contributes and \\v\\p > 5, then 



2W.|C;,n5|(^)| = (1±7)|C/,|. (24) 



Combining (20), (21), (22), and (24), we have with probabihty at least 1 - - n~^^^\ 

(1 - 7(2 + 7))(1 - l)\Ch\ < 2'('^)|F.,/.| < (1 + 7)|C„| + 37(2 + 7)(|C/.-i| + \Ch+i\). 
Since this holds for at least 3r/5 different z, it follows that 

(1 - 7(2 + 7))(1 - l)\Ch\ <ch<{l + ^)\Ch\ + 37(2 + 7){\Ch^i\ + \Ch+i\). 

and the second part of the lemma now follows by a union bound over all h and all times in T, exactly in 
the same way as the first part of the lemma. Note that 1 — 47 < (1 — 7(2 + 7))(1 — 7) for small enough 

7 = e(e). ■ 
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5.3.4 Putting It All Together 

Lemma 12 With probability at least 5/6, at all times the coordinator's output is correct. 

Proof: The coordinator outputs up until the first point in time in the union of the k streams for which 
E/i>o c/i • ryP • (1 + 7)P^ > (1 - e/2)r. It suffices to show that 

^c,7^Pil + jr^ = {l±e/2)\\v\\P (25) 

h>0 

at all times in the stream. We first show that with probability at least 5/6, for all times in T, 

^cW(l + 7f'^ = (l±£/4)|b||^, (26) 

h>0 

and then use the structure of T and the protocol to argue that (25) holds at all times in the stream. 

Fix a particular time in T. We condition on the event of Lemma 11, which by setting 7 = 0(e) small 
enough, can assume occurs with probability at least 5/6. 

\ P <: L 
\p ^ 5- 

7 = 0(e), we have 



First, suppose at this point in time we have \\v\\p < ^. Then by Lemma 11, for sufficiently small 



^c,-r^Pil + ^r^ < ^(3|C,|+37(2 + 7)(|C,_i| + |C,+i|)).7?f(l + 7)^'' 

h>0 h>0 



^ EhE^' + 37(2 + 7)(l + 7)' E 
^>o V jeCh ieCh_iUCh+i 



and so the coordinator will correctly output 0, provided e < |. 



5" 

We now handle the case ||f ||p > |. Then for all contributing C^, we have 



(1 - 47)|C^| <Ch<il + 7)\Ch\ + 37(2 + 7)i\Ch-i\ + \Ch+i\), 
while for all Ch, we have 

Ch < ^\Ch\ + 37(2 + 7)(|C,._i| + \Ch+i\). 

Hence, using (14), 

5^ c;,-r?f (1+7)^" > (l-47)|C„|r?P(l+7)P" 

h>0 contributing Ch 



> 



(1 - 47) 

(1+7)^ 

contributing j&C'h 



> (l-67)^(l-0(l/Bi/'^))-||»||?. 

For the other direction, 

Yc,-ifil+jr^ < Yl (l + 7)|C/.|r?P(l + 7)^V E 3|C,|r?*'(l + 7) 

h>Q contributing non-contributing 



ph 
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\vF 



+ 37(2 + i){\Ch-i\ + \Ch+i\)vni + if" 

h>0 

< (1 + 7) E + 11^11^ + 0(7) 

contributing Ch j&C/^ 

< {l + 0{j) + 0{l/B'/'mv\\P. 

Hence, (26) follows for all times in T provided that 7 = @{e) is small enough and B = poly(e~^ log n) is 
large enough. 

It remains to argue that (25) holds for all points in time in the union of the k streams. Recall that each 
time in the union of the k streams for which ||f ||p > (1 + 7)* for an integer i is included in T, provided 

II'^^IIp < 2^^- 

The key observation is that the quantity J2h>o + lY^ is non-decreasing, since the values 

are non-decreasing. Now, the value of \\v\\p at a time t not in T is, by definition of T, within a factor of 
(1 ± 7) of the value of ||f ||p for some time in T. Since (26) holds for all times in T, it follows that the value 

of ^;^>o ChTf{l + '^Y^ at time t satisfies 

(1 - 7)(1 - e/mv\\l < + 7r' < (1 + 7)(1 + ^/4)||^;||^, 

which implies for 7 = 0(e) small enough that (25) holds for all points in time in the union of the k streams. 
This completes the proof. ■ 

Theorem 8 (MAIN) With probability at least 2/3, at all times the coordinator's output is correct and the 
total communication is k'P~^ ■ poly(e~^ log n) bits. 

Proof: Consider the setting of , . . . at the first time in the stream for which || Yl^=i > 
any non-negative integer vector w and any update Cj, we have \\w + ej\\p < {\\w\\p + 1)^ < 2P||t(;||p. 
Since || X^JLj^ v^\\p is an integer and r > 1, we therefore have || J2i=i "^Nlp ^ 2^ ■ t. By Lemma 7, the 
expected communication for these v^, . . . is k^~^ ■ poly(e~^ log n) bits, so with probability at least 5/6 
the communication is k^^^ • poly(e~^ logn) bits. By Lemma 12, with probability at least 5/6, the protocol 
terminates at or before the time for which the inputs held by the players equal v^, . . . ,v^. The theorem 
follows by a union bound. ■ 



6 Related Problems 

In this section we show that the techniques we have developed for distributed Fq and Fp [p > 1) can also 
be used to solve other fundamental problems. In particular, we consider the problems: all-quantile, heavy 
hitters, entropy and ip for any p > 0. 



6.1 The AU-Quantile and Heavy Hitters 

We first give the definitions of the problems. Given a set A = {ai, 02, • • • , flm} where each Cj is drawn 
from the universe [A^], let /j be the frequency of item i in the set A. Thus X^jgj^v] fi = m. 

Definition 3 ((p-heavy hitters) For any < (/) < 1, the set of (j)-heavy hitters of A is H^{A) = {x \ fx > 
(j)m}. If an e-approximation is allowed, then the returned set of heavy hitters must contain H^{A) and 
cannot include any x such that fx < {(j) — e)m. If{(j) — e)m < fx < (pm, then x may or may not be included 
inH^{A). 
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Definition 4 ((p-quantile) For any < < 1, the (j)-quantile of A is some x such that there are at most 
(f)m items of A that are smaller than x and at most (1 — (j))m items of A that are greater than x. If an e- 
approximation is allowed, then when asking for the (p-quantile of A we are allowed to return any (j)' -quantile 
of A such that (p — e<(j)'<(j) + £. 

Definitions (All-quantile) The e-approximate all-quantile (QUAN) problem is defined in the coordinator 
model, where we have k sites and a coordinator Site Si {i G [k] ) has a set Ai of items. The k sites want 
to communicate with the coordinator so that at the end of the process the coordinator can construct a data 
structure from which all e-approximate (p-quantile for any < (/> < 1 can be extracted. The cost is defined 
as the total number of bits exchanged between the coordinator and the k sites. 

Theorem 9 Any protocol that computes e-approximate Q UAN or e-approximate min{ ^ , } -heavy hitters 
with error probability 6 for some sufficiently small constant 6 has communication complexity f](min{\/fc/e, 
bits. 

Proof: We first prove the theorem for QUAN. In the case that k > we prove an Q{l/e'^) lower 

bound. We prove this by a simple reduction from A:-GAP-MAJ. We can assume k = 1/e^ since if k > l/e"^ 
then we can just give inputs to the first 1/e^ sites. Set /3 = 1/2. Given a random input Zi, Z2, . . . , Zk of 
fc-GAP-MAJ chosen from distribution fi, we simply give the site Si with Zi for the first 1 < i < k sites. It is 
easy to observe that a protocol that computes e/2-approximate QUAN on A = {Zi, Z2, ■ ■ ■ , Z^} with error 
probability 6 also computes A;-GAP-MAJ on input distribution f^i with error probability 6, since the answer 
to A:-GAP-MAJ is simply the answer to ^-quantile. The Q{l/e'^) lower bound follows from Corollary 1. 

In the case that k < we prove an Q{\/^/e) lower bound. We again perform a reduction from 

/c-GAP-MAJ. Set /3 = 1/2. The reduction works as follows. We are given £ = l/{eVk) independent 
copies of /c-GAP-MAJ with Z^,Z^,...,Z^ being the inputs, where Z' = {Zf, Z^, . . . , Z^} G {0, 1}'^ is 
chosen from distribution /x. We construct an input for QUAN by giving the j-th site the item set Aj = 
{Zj, 2 + Z|, 4 + Z|, . . . , 2{i - 1) + Z^}. It is not difficult to observe that a protocol that computes e/2- 
approximate QUAN on the set A = {Ai,A2, . . . ,Aj} with eiTor probability 5 also computes the answer to 
each copy of /c-GAP-MAJ on distribution /i with error probability 6, simply by returning {Xi — 2{i — 1)) 
for the i-th copy of fc-GAP-MAJ, where Xi is the e/2-approximate ^ -quantile. 

On the other hand, any protocol that computes each of the i independent copies of fc-GAP-MAJ correctly 
with error probability 5 for a sufficiently small constant 5 has communication complexity Q.{-\/k/e). This is 
simply because for any transcript 11, by Corollary 1, independence and the chain rule we have that 

. . . , z^; n) > ^ i{z^- n) > n{ik) > n{Vk/e). (27) 

By our reduction the theorem follows. 

The proof for heavy hitters is done by essentially the same reduction as that for QUAN. In the case 
that k = 1/e^ (or k > 1/e^ in general), a protocol that computes e/2-approximate ^-heavy hitters on 
A = {Zi, Z2, ■ ■ ■ , Zk} with error probability 6 also computes /c-GAP-MAJ on input distribution fi with 
error probability 6. In the case that k < 1/e^, it also holds that a protocol that computes e/2-approximate 
^-heavy hitters on the set A = {Ai, A2, . . . , Aj} where Aj = {Zj, 2 + Z|, 4 + Z|, . . . , 2{e - 1) + 
with error probability 6 also computes the answer to each copy of fc-GAP-MAJ on distribution fi with error 
probability 6. ■ 
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6.2 Entropy Estimation 

We are given a set A = {(ei, ai), (e2, 02), • • • , (e^, o-m)} where each {k G [m]) is drawn from the 
universe [N], and G {+1,-1} denote an insertion or a deletion of item e^. The entropy estimation 
problem (ENTROPY) asks the value H{A) = E,e[iv](l/jl / L) log{L / \ fj\) where f, = Efc:e,.=j«fc and 
L = J2je[N] \fj\- e-approximate ENTROPY problem, the items in set A are distributed to k sites 



who want to compute a value H{A) for which H{A) — H{A) 
theorem. 



< e. In this section we prove the following 



Theorem 10 There exists an input distribution such that any protocol that computes e-approximate EN- 
TROPY on this distribution correctly with error probability at most 5 for some sufficiently small constant 5 
has communication complexity (l{k/e^). 

Proof: As with F2, we prove the lower bound for the ENTROPY problem by a reduction from A;-BTX. 
Given a random input B for fc-BTX according to distribution u with n = j'^k'^ for some pai^ameter 7 = 
log~'^(A;/e) for large enough constant d, we construct an input for ENTROPY as follows. Each block 
j G in k-BTX corresponds to one coordinate item ej in the vector for ENTROPY; so we have in total 

items in the entropy vector. The k sites first use shared randomness to sample 7^fc^/e^ random ±1 

values for each coordinate across all blocks in B Let {Rl, Rf, . . . , ii^2^2} be these random ±1 values. 

Each site looks at each of its bits Bj ^ (i G [fc], £ G ■~i'^k'^,j G and generates an item (cj, i?;^) (recall 

that R-'^ denotes insertion or deletion of the item Cj) if B^^ = 1. Call the resulting input distribution v' . 

We call an item in group Gp if the fc-XOR instance in the coiTcsponding block is a 00-instance; and in 
group Gq if it is a 11-instance; in group Gjj if it is a 01-instance or a 10-instance. Group Gu is further 
divided to two subgroups Gjji and Gf/j, containing all 10-instance and all 01-instance, respectively. Let 
P, Q, U, Ui, U2) be the cardinalities of these groups. Now we consider the frequency of each item type. 

1. For an item ej G Gp, its frequency fj is distributed as follows: we choose a value i from the binomial 
distribution on n values each with probability 1/2, then we take the sum kj of i i.i.d. ±1 random 
variables. We can thus write \fj\ = 

2. For an item ej G Gq, its frequency fj is distributed as follows: we choose a value i from the binomial 
distribution on n values each with probability 1/2, then we take the sum kj of i i.i.d. ±1 random 
variables. Then we add the value Rj, • k, where £* is the index of the special column in block j. We 
can thus write \fj\ as \k + R-'^, • By a Chernoff-Hoeffding bound, with probability 1 — 2e~^^/^, 
we have \Kj\ < Xjk. We choose A = log(/c/e), and thus A7 = o(l). Therefore kj will not affect the 
sign of fj for any j (by a union bound) and we can write \fj\ = k-\- R\, • kj . Since kj is symmetric 
about and i?]* is a random =bl variable, we can simply drop Rj-^ and write \ fj\=k + K,j. 

3. For an item ej G Gu, its frequency fj is distributed as follows: we choose a value i from the binomial 
distribution on n values each with probability 1/2, then we take the sum kj of i i.i.d. ±1 random 
variables. Then we add the value R-'^, ■ k/2, where i* is the index of the special column in block j. 
We can thus write as |/c/2 + Rj, ■ kj. As in the previous case, with probability 1 — 2e^^^/^, kj 
will not affect the sign of fj and we can write \fj\=k/2-\-Kj. 



'"By Newman's theorem (cf. [41], Chapter 3) we can get rid of the pubhc randomness by increasing the total communication 
complexity by no more than an additive 0(log(7fc/e)) factor which is negligible in our proof. 
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By a union bound, with error probability at most 6i = ■ 2e ^^/^ = 1 — o(l), each kj [ej G Gq U Gu) 
will not affect the sign of the corresponding fj. Moreover, by another Chernoff bound we have that with 
error probability 62 = We^^"^^, P, Q, Ui, U2 are equal to l/4e^ it co/e, and U = l/(2e^) ± cq/e. Here 
62 can be sufficiently small if we set constant cq sufficiently large. Thus we have that with arbitrary small 
constant error 60 = 5i + 62, all the concentration results claimed above hold. For simplicity we neglect this 
part of enw since it can be ai^bitrarily small and will not affect any of the analysis. In the rest of this section 
we will ignore arbitrarily small eiTors and drop some lower order terms as long as such operations will not 
affect any the analysis. 

The analysis of the next part is similar to that for our F2 lower bound, where we end up computing 
F2 on three different vectors. Let us calculate Hq, Hi and H2, which stand for the entropies of all /c-sites, 
the first k/2 sites and the second k/2 sites, respectively. Then we show that using Hq, Hi and H2 we can 
estimate U well, and thus compute A;-BTX correctly with an arbitrarily small constant error Thus if there is 
a protocol for ENTROPY on distribution v' then we obtain a protocol for k-BTX on distribution u with the 
same communication complexity, completing the reduction and consequently proving Theorem 10. 

Before computing Hq, Hi and H2, we first compute the total number L of items. We can write 

ej£Gp ^j^Gg ej£Gu 

= Q-k + U ■k/2+ \Kj\+ Kj. (28) 

The fourth term in (28) can be bounded by 0{'^k/e) with arbitrarily large constant probability, using a 
Chernoff-Hoeffding bound, which will be o(eL) and thus can be dropped. For the third term, by Chebyshev's 
inequality we can assume (by increasing the constant in the big-Oh) that with arbitrarily large constant 
probability, Y^^ .^q^ = (1 ± e) • l/(4e^) • E[|Kj|], where E[|Kj|] = ©(7/c) follows by approximating 
the binomial distribution by a normal distribution (or, e.g., Khintchine's inequality). Let Zi = E[|Kj|] be a 
value which can be computed exactly. Then, X^ggf^^ = l/(4e^) • Zi ±0{'yk/e) = zi/(4e^) ±o{sL), 
and so we can drop the additive o{eL) term. 
Finally, we get, 

L = Q ■ k + U ■ k/2 + ri (29) 

where ri = zi/(4e^) = ©{jk/e'^) is a value that can be computed by any site without any comunication. 
Let pj = \ fj\ /L {j G We can write H as follows. 

-^0 = E i°g(iM) + E + E P^ log(iM)- 

CjeP ejSQ ej^U 

= logL-S/L (30) 

where 

S = j;|/,|log|/,-|+5^|/,-|log|/,|+5]|/,-|log|/,-| (31) 

ejSP ejGQ ej&U 

We consider the three summands in (31) one by one. For the second term in (31), we have 

E = E('^^ + '^i)^°s(A; + Kj) 
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Q ■ klogk + k log(l + Kj/k) 



Q ■ klogk±0{Y^ 



(32) 



The second term in (32) is at most o{eQ ■ jk) = o{k/eps), and can be dropped. By a similar analysis we 
can obtain that the third term in (31) is (up to an o{k/e) term) 



E \ fj\log\f,\=U-{k/2)log{k/2). 



(33) 



Now consider the first term. We have 

El/i|log|/j| = ^\Kj\\og\Kj 



e,6P 



e,eP 



(l/4e2±0(l/e)).E[|K,|log|«:,|] 

(l/4e2) . E[\Kj\log\K,\]±0{l/e) ■ E[|k,-| log 



(34) 



where Z2 = E[| kj \ log \ Kj\] = 0{'yk log A;) can be computed exactly. Then the second term in (34) is at most 
0{'yklogk/e) = o{k/e), and thus can be dropped. Let r2 = (l/4e^) • Z2 = 0{'-)k\ogk/e'^). By Equations 
(30), (29), (31), (32), (33), (34) we can write 



\og{Q-k + U ■k/2 + ri) + 



Q-klogk + U- (fc/2) log(fc/2) + ra 
Q ■ k + U ■ k/2 + ri 



(35) 



Let U = l/(2e2) + [/' and Q = l/4e^ + Q', and thus U' = 0(l/e) and Q' = 0{l/e). Next we convert 
the RHS of (35) to a linear- function of U' and Q'. 



Ho = log{k/{2e^) + Q'k + U'k/2 + ri) + 



log(A;/(2e2) + n) + log (^1 + [U' + 2Q')- 



l, + Q').k\ogk+(^^+U')-^\og^+r2 



k/{2£'^) + Q'k + U'k/2 + ri 



-(36) 



+ 2eVi/fey/ 



+ 



4e2 



+ Q'\ ■k\ogk + 



1 



(2e2) 7 2^2 



2e^ 



k + 2e2ri 



± 0{e') 

\og{k/{2e^) + n) + {U' + 2Q') ^ 
A:(21ogfc - 1) 



(37) 



1 + 2e2ri/A; 



+ 



4e2 



k k 

+ r2) + Q'- A; log /c + [/'•- log - 



26^ 



k + 2e'^ri 



1-U' 



l + 2e'^ri/k ^ l + 2e'^ri/k) 



(38) 
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= ai + a2U' + asQ' , (39) 
up to o(e) factors (see below for discussion), where 

1 /, //n 2^ ^ /k{2logk-l) \ 2e^ 



4^2 7 fc + 2e2ri' 

/fc / A:(21ogfc-l) \ \ 2g2 

~ l + 2g2ri/A; ^ V 2 °^ 2 " V 4^2 ^ " l + 2g2ri/fcy ' k + 2eW 

2g2 / ^ , /A;(21ogA;- 1) \ 2g2 \ 2g2 

= i + 2g2n/fc + V °' I + "V' i + 2gWfc J"feT^' ^^^^ 

From (36) to (37) we use the fact that 1/(1 + g) = 1 - g + O(g^). From (37) to (38) we use the fact 
that log(l + g) = g + 0(g2). From (38) to (39) we use the fact that all terms in the form of U'Q', U'^, Q'^ 
are at most ±o(g) (we are assuming 0(g^ log /c) = o(g) which is fine since we are neglecting polylog(iV) 
factors), therefore we can drop all of them together with the other ibO(g^) terms, and consequently obtain a 
linear function on U' and Q'. 

Next we calculate Hi, and the calculation of H2 will be exactly the same. The values ti,t2 used in 
the following expressions are essentially the same as ri,r2 used for calculating Hq, with ti = ©(7/c) and 
t2 = 0{jklogk/e^). Set U[ = Ui - l/Ae^ and U!^ = U2- l/4g2. 



Hi = logiiQ + Ui)k/2 + ti) + 



Q • f log f + C/i • I log f + t2 



(Q + Ui)k/2 + t 



log(^/(4g2) + iQ' + U[)k/2 + + Q • 2 log 2 + f^i • I log 2 + + t2 



k/iie^) + {Q' + U[)k/2 + ti 



log{k/{Ae^) + ti) + {Q' + U[' 



where 



1 + 4g2ti/A; 

/, yfc, k k^ k k\og(k/2) \ 4g2 / Q'-2g2 + C/i' •2g2\ 

2^°'2+^^-2^°^2+^i^ + ^0-A^TIi^-('-^^^ 
/3i + P2U'i + hQ'. (41) 



, /, //. 2^ N ,'A;log(A:/2) \ 4g2 
/3i = log(/c/(4g^) + ti)+ ^^^ + ^2^ 



4g2 7 k + Ae'^ti 

2g2 A, A; [k\ogik/2) \ 2g2 \ 4g2 
/52 = + - log 77 - + *2 ^ ^ 



l + 4g2ti/A; V2 2 V 4g2 y 1 + 4g2ti/A:y A; + 4g2ti ' 

2g2 //e, A; (k\og{k/2) \ 2g2 \ 4g2 

= 1 + ^sHi/k + U 2 - V 4g2 + 1 + 4g2ti/fc J fcTIi^- ^^^^ 

By the same calculation we can obtain the following equation for H2. 

H2 = /3i + /32^7^ + /33Q'. (43) 
Note that [/' = [/{ + J/g- Combining (41) and (43) we have 

H1 + H2 = 2P1+P2U' + 2p^Q'. (44) 
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It is easy to verify that Equations (39) and (44) are lineai^ly independent: by direct calculation (notice that 
ri,r2 are lower order terms) we obtain a2 = ^(1 ± o(l)) and = 3e^(l it o(l). Therefore 02/03 = 
(1 lb o(l))/6. Similarly we can obtain /32/2/33 = (1 it o(l))/2. Therefore the two equations are linearly 
independent. Furthermore, we can compute all the coefficients ai,a2,as, Pi, 132, up to a (1 it o(e)) 
factor. Thus if we have ae additive approximations of Hq,Hi, H2 for a sufficient small constant a, then we 
can estimate U' (and thus U) up to an additive error of a' /e for a sufficiently small constant a' by Equation 
(39) and (44), and therefore fc-BTX. This completes the proof. ■ 

6.3 £p for any constant p>l 

Consider an n-dimensional vector x with integer entries. It is well-known that for a vector v of n i.i.d. 
A^(0, 1) random variables that {v,x) ~ iV(0, \\x\\l). Hence, for any real p > 0, E[|(v,x)|p] = HxUgGp, 
where Gp > is the p-th moment of the standard half-normal distribution (see [1] for a formula for these 
moments in terms of confluent hypergeometric functions). Let r = 0(e~^), and v^, . . . he independent 
n-dimensional vectors of i.i.d. A^(0, 1) random variables. Let yj = {v^ ,x)/Gp^^, so that y = {yi, . . . , y^)- 
By Chebyshev's inequality for r = 0(e~^) sufficiently large, ||y||p = (1 it e/3)||3;||2 with probability at 
least 1 — c for an arbitrarily small constant c > 0. 

We thus have the following reduction which shows that estimating up to a (1 + e) -factor requires 
communication complexity for any p > 0. Let the k parties have respective inputs x^, . . . , x^, and 

let X = Yl'l^i X*. The parties use the shared randomness to choose shared vectors u^, . . . , u'" as described 
above. For i = 1, . . . ,k and j = 1, . . . , r, let y*- = {v^ ,x^)/Gy^, so that = {y\, . . . , y)). Let y = 

J2i=iy'^- By above, ||y||p = (1 it e/3)||a;||2 with probability at least 1 — c for an aitttrarily small 
constant c > 0. We note that the entries of the can be discretized to 0(log 7i) bits, changing the p-norm 
of y by only a (1 it 0(l/n)) factor, which we ignore. 

Hence, given a randomized protocol for estimating \\y\\p up to a (1 + e/3) factor with probability 1 — 6, 
and given that the parties have respective inputs y^ , . . . ,y^, this implies a randomized protocol for estimating 
\\x\\^ up to a (1 lb e/3) • (1 it e/3) = (1 ± e) factor with probability at least 1 — 5 — c, and hence a protocol 
for estimating £2 up to a (1 it e) factor with this probability. The communication complexity of the protocol 
for £2 is the same as that for Ip. By our communication lower bound for estimating £2 (in fact, for estimating 
F2 in which all coordinates of x are non-negative), this implies the following theorem. 

Theorem 11 The randomized communication complexity of approximating the ip-nonn, p > 1, up to a 
factor ofl + e with constant probability, is Q{k/e'^). 
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